diff --git a/text/TODO.txt b/text/TODO.txt index 5c84c1a..188460c 100644 --- a/text/TODO.txt +++ b/text/TODO.txt @@ -141,3 +141,16 @@ * ridgeCV * 2 und 4 Synergien * vergleich EMG Synergie + + +* Acquisition -> preprocessing +* Yeung eher raus +* mean r : beide angeben +* Quelle zu 10-20 besser + + +Vortrag +------- +* leichter Einstieg, sonst schon fachlich +* alle Ergebnisse +* Grundlagen ganz kurz diff --git a/text/thesis/01Introduction.tex b/text/thesis/01Introduction.tex index 1ec573b..8e51a79 100644 --- a/text/thesis/01Introduction.tex +++ b/text/thesis/01Introduction.tex @@ -2,24 +2,24 @@ \label{introduction} \section{Motivation} \label{intro:motivation} - \qq{Reading the mind} is something humanity is and always was exited about. Whatever one may think about the possibility of doing so as a human, computers have a chance to catch a glimpse of the (neuronal) activity in the human brain and interpret it.\\ - Here Electroencephalography (EEG) is used to record brain activity and try to predict arm movements from the data.\\ + \qq{Reading the mind} is something humanity is and always has been exited about. Whatever one may think about the possibility of doing so as a human, computers have a chance to catch a glimpse of the (neuronal) activity in the human brain and interpret it.\\ + Here, Electroencephalography (EEG) is used to record brain activity and try to predict arm movements from the data.\\ Using this as a Brain-Computer-Interface (BCI) holds the possibility of restoring e.g. a lost arm. This arm could be used as before by commands constructed in the brain. In a perfect application there would be no need of relearning the usage. The lost arm could just be replaced.\\ - Another opportunity this technique provides is support of retraining the usage of the natural arm after stroke. If it is possible to interpret the brainwaves the arm can be moved passively according to the commands formed in brain. This congruency can restore the bodies own ability to move the arm as \cite{Gomez11} show.\\ - In a slightly different context it might become possible to handle a machine (e.g. an industrial robot or mobile robots like quadrocopters) with \qq{thoughts} (i.e. brain activity) like an additional limb. One could learn to use the possibilities of the robot like the possibilities of his arm and hand to modulate something.\\ + Another opportunity this technique provides is the support of retraining the use of the natural arm after stroke. If it is possible to interpret the brainwaves, the arm can be moved passively according to the commands formed in brain. This congruency can restore the body's own ability to move the arm as \cite{Gomez11} show.\\ + In a slightly different context it might become possible to handle a machine (e.g. an industrial robot or mobile robots like quadrocopters) with \qq{thoughts} (i.e. brain activity) like an additional limb. One could learn to use the possibilities of the robot like the possibilities of one's arm and hand to modulate something.\\ Similar to that application it could be possible to drive a car by brain activity. This would lower the reaction time needed to activate the breaks for example by direct interaction instead of using the nerves down to the leg to press the break. - Using non-invasive methods like EEG makes it harder to get a good signal and determine its origin. However it lowers the risk of injuries and infections which makes it the method of choice for wide spread application (cf. \cite{Collinger13}). Modern versions of EEG-caps even use dry electrodes which allow for more comfort with similar predictive strength in context of movement of the whole body due to mathematical post-processing (cf. \cite{Yeung15}). So everybody may put on and off an EEG-cap without high costs for production or placement.\\ + Using non-invasive methods like EEG makes it harder to get a good signal of brain activity and determine its origin. However, it lowers the risk of injuries and infections which makes it the method of choice for wide spread application (cf. \cite{Collinger13}). Modern versions of EEG-caps even use dry electrodes which allow for more comfort with similar predictive strength in context of movement of the whole body due to mathematical post-processing (cf. \cite{Yeung15}). So everybody may put on and off an EEG-cap without high costs for production or placement.\\ With EEG brainwaves can be captured that let us predict intended movements. This movement predictions however bears some problems up to now. Predicting synergies instead of predicting positions or movement directly may solve some of these problems, since it is closer to the concept the nervous system uses. Most likely in brain there are no neurons for every single muscle involved in movement. Instead there are synergies activated, meaning there is coordinated co-activation of different muscles. When using synergies only some basic movements have to be represented in brain and can be combined for more complex movements.\\ Assuming this it should be easier to predict synergies while we can also use them to move a robotic arm or a quadrocopter. - This improvements shall be shown in this thesis. To do so different methods of the acquisition of synergies from EMG are compared with other data and paradigms like direct prediction from EEG, EMG and low frequencies. -\section{Overview} + This improvements shall be shown in this thesis. To do so, different methods of the acquisition of synergies from EMG are compared with other data and paradigms like direct prediction from EEG, EMG and low frequencies. +\section{Overview}%TODO After this Introduction the scientific background and context of this work will be stated (Chapter \ref{chp:background}). This reaches from Principal Component Analysis (PCA) and Autoencoders over Support Vector Machines (SVMs) and regression to boxplots and topographical plots.\\ Material and Methods (Chapter \ref{chp:mat}) shows the work done for tis thesis, beginning with the experimental design followed by the methods for data acquisition and analysis.\\ In chapter \ref{chp:results} Results we show the numerical findings of our work separated into parts on synergies, classification, regression and a topographical analysis of the brain activity.\\ - This results and their meaning will be discussed in chapter \ref{chp:dis} Discussion, which is concluded with a look in the possible future. + These results and their meaning will be discussed in chapter \ref{chp:dis} Discussion, which is concluded with a look into the possible future. - The appendix then contains a list of contents on the CD and in the repository (Appendix \ref{app:cd}) and a small documentation of the code used (Appendix \ref{app:docu}) + The appendix then contains a list of contents on the CD and in the repository (Appendix \ref{app:cd}) and a small documentation of the code used (Appendix \ref{app:docu}). diff --git a/text/thesis/02MaterialsAndMethods.tex b/text/thesis/02MaterialsAndMethods.tex index 9629819..2069f68 100644 --- a/text/thesis/02MaterialsAndMethods.tex +++ b/text/thesis/02MaterialsAndMethods.tex @@ -5,24 +5,24 @@ \section{Communication between Neurons and Machines} \subsection{Brain-Computer-Interfaces} The idea of BCIs began to spread in the 1970s when Vidal published his paper (\cite{Vidal73}).\\ - The connection between brain and computer allows to help the human in different ways. From implants to re-acquire hearing and sight in one direction to the commanding of machines by brainwaves or communication although having the Locked-In syndrome in the other direction a wide field of possibilities is given yet. However most applications require lots of training and are sometimes quite far from natural behavior. Binary decisions for example are usually made through an excited or relaxed mood, which can easily be detected in brain activity.\\ + The connection between brain and computer allows to help the human in different ways. From implants to re-acquire hearing and sight in one direction to the commanding of machines by brainwaves or communication although having the Locked-In syndrome in the other direction a wide field of possibilities is given yet. However, most applications require lots of training and are sometimes quite far from natural behavior. Binary decisions for example are usually made through an excited or relaxed mood, which can easily be detected in brain activity.\\ \subsubsection{Methods of recording} - First approaches used invasive BCIs earlier in Animals (rodents and monkeys) later also in humans. Invasive BCIs in humans were mostly implanted when the human was under brain surgery for another reason like therapy of epilepsy. Problems of invasive BCIs are the need to cut through skull and dura mater. This can lead to infections and severe brain damage.\\ + First approaches used invasive BCIs earlier in Animals (rodents and monkeys), later also in humans. Invasive BCIs in humans were mostly implanted when the human was under brain surgery for another reason like therapy of epilepsy. Problems of invasive BCIs are the need to cut through skull and dura mater, which can lead to infections and severe brain damage.\\ An improvement were less invasive BCIs with e.g. Electrocorticography (ECoG) which is placed inside the skull but outside the dura which decreased the risk for infections massively.\\ - Measuring outside the skull entails even less risk, the dura and skull however lower the quality of the signal massively. With some improvements EEG has a spatial resolution of 2-3 cm (cf. \cite{Babiloni01}). This is quite bad compared to the single neuron one can observe with invasive methods. However since activity is usually higher in an area and not only a single neuron EEG meets the requirements here.\\ - In addition EEG is much cheaper and easier to use than other techniques. There is no need for surgery (like for invasive methods) and the hardware can be bought for less than 100\euro{} while functional Magnetic Resonance Imaging (fMRI) hardware costs far above 100,000\euro{}. This is one of the reasons EEG is far more available than other techniques. There are some inventions of younger date but not as much work has been done with them why they are not as well known and as far distributed as EEG.\\ - Another pro of EEG is that the device is head mounted. That means the user may move while measuring without high impact on the tracking of activity. This is highly necessary for any BCI used in daily life. + Measuring outside the skull entails even less risk, the dura and skull however lower the quality of the signal massively. With some improvements EEG has a spatial resolution of 2-3 cm (cf. \cite{Babiloni01}). This is quite bad compared to the single neuron one can observe with invasive methods. However, since activity is usually higher in a whole area and not only a single neuron EEG meets the requirements here.\\ + In addition, EEG is much cheaper and easier to use than other techniques. There is no need for surgery (like for invasive methods) and the hardware can be bought for less than 100\euro{} while functional Magnetic Resonance Imaging (fMRI) hardware costs far above 100,000\euro{}. This is one of the reasons EEG is far more available than other techniques. There are some newer inventions, but not as much work has been done with them, which is why they are not as well known and as far distributed as EEG.\\ + Another advantage of EEG is that the device is head mounted. That means the user may move while measuring without high impact on the tracking of activity. This is highly necessary for any BCI used in daily life. \subsection{EEG} - When using Electroencephalography (EEG) one measures the electrical fields on the scalp that are generated by activity of neurons in the brain. These measurements allow some interpretation about what is happening inside the skull. Here the recorded currents are used directly to train a SVM or as predictor for regression. + When using Electroencephalography (EEG), one measures the electrical fields on the scalp that are generated by the activity of neurons in the brain. These measurements allow some interpretation of what is happening inside the skull. Here the recorded currents are used directly to train an SVM or as predictor for regression. - The foundation stone for EEG was laid in 1875 when Richard Caton found electrical activity around the brain of monkeys. After testing the methods on animals in 1924 the first human EEG was recorded by Hans Berger in Jena. He also coined the term Electroencephalography and is seen as the inventor of EEG. + The foundation of EEG was laid in 1875 when Richard Caton found electrical activity around the brain of monkeys. After testing the methods on animals in 1924 the first human EEG was recorded by Hans Berger in Jena. He also coined the term Electroencephalography and is seen as the inventor of EEG. - The frequencies typically used for movement prediction in EEG are about 8-24 Hz (\cite{Blokland15},\cite{Ahmadian13},\cite{Wang09}). + The frequencies typically examined for movement prediction in EEG are about 8-24 Hz (\cite{Blokland15},\cite{Ahmadian13},\cite{Wang09}). EEG is often used for non-invasive BCIs because it is cheap and easier to use than e.g. fMRI. The electrodes have to be spread over the scalp. To allow for comparability there are standardized methods for this. These methods also bring a naming convention with them. \subsubsection{10-20 system} - In this standard adjacent electrodes are placed either 10\% or 20\% of the total front-back or left-right distance apart. This standardization also makes it possible to name each electrode or rather here place. This is done with capital letters for lobes (Frontal, \qq{Central}, Parietal, Occipital and Temporal) and numbers for the specific place on the lobe. Even numbers are on the right side of the head, odd on the left; larger numbers are closer to the ears, lower numbers closer to the other hemisphere. The exact number now refers to the exact distance from center: $$\left\lceil\frac{x}{2}\right\rceil\cdot \frac{d}{10}$$ where $x$ is the number and $d$ the diameter of the scalp. Electrodes in the center are named with a lower case $z$ e.g. $Cz$.\\ - Electrodes between two lobes (10\% instead of 20\% distance) are named with the both adjacent lobes (anterior first) e.g. $FCz$ (between frontal and central lobe).\\ + In this standard adjacent electrodes are placed either 10\% or 20\% of the total front-back or left-right distance apart. This standardization also makes it possible to name each electrode or rather its place. This is done with capital letters for lobes (Frontal, \qq{Central}, Parietal, Occipital and Temporal) and numbers for the specific place on the lobe. Even numbers are on the right side of the head, odd on the left; larger numbers are closer to the ears, lower numbers closer to the other hemisphere. The exact number now refers to the exact distance from center: $$\left\lceil\frac{x}{2}\right\rceil\cdot \frac{d}{10}$$ where $x$ is the number and $d$ the diameter of the scalp. Electrodes in the center are named with a lower case $z$ e.g. $Cz$.\\ + Electrodes between two lobes (10\% instead of 20\% distance) are named with both adjacent lobes (anterior first) e.g. $FCz$ (between frontal and central lobe).\\ The naming convention according to the 10-20 system is shown in figure~\ref{fig:10-20}. \begin{figure}[!p] \centering @@ -31,33 +31,33 @@ \label{fig:10-20} \end{figure} \subsubsection{Frequency bands} - EEG can be divided into several bands according to the frequency of the waves. Usually this also corresponds to the form of waves and allows insights in the mental state from rest to highly aroused. According to \cite{Gerrard07} following bands can be proposed: + EEG waves can be divided into several bands according to the frequencies. Usually this also corresponds to the form of the waves and allows insights into the mental state from resting to highly aroused. According to \cite{Gerrard07} the following bands can be proposed: \begin{itemize} \item Delta: 1-4 Hz \item Theta: 4-7 Hz \item Alpha: 7.5-12.5 Hz (depending also on age) \item Beta: 13-20Hz \end{itemize} - There are different definitions of the limits of the bands, but for a rough estimation this limits are suitable. For more exact results an analysis of wave patterns would be necessary. + There are different definitions of the limits of the bands, but for a rough estimation these limits are suitable. For more exact results an analysis of wave patterns would be necessary. In limits similar to them of the alpha wave also Mu-waves are measured. They are associated with mirror neurons in the motor cortex and their activity is suppressed while the subject is moving. %TODO \subsection{Low Frequencies} - In the 2000s there began a movement using new techniques to record ultrafast and infraslow brainwaves (above 50Hz and below 1Hz). These were found to have some importance (cf. \cite{Vanhatalo04}).\\ - Also in predicting movements there was found some significance in low frequency as was done by \cite{Liu11} and \cite{Antelis13} for example. \citeauthor{Antelis13} found correlations between hand movement and low frequency signal of $(0.29,0.15,0.37)$ in the dimensions respectively.\\ - \cite{Lew14} state low frequencies are mainly involved in spontaneous self-induced movement and can be found before the movement starts. By this they may be a great possibility to lower reaction time of neuroprostheses for example. + In the 2000s there began a movement using new techniques to record ultrafast and infraslow brainwaves (above 50Hz and below 1Hz). These were found to be of some importance for movement prediction (cf. \cite{Vanhatalo04}).\\ + Also in predicting movements there was found some significance in low frequency as was done by \cite{Liu11} and \cite{Antelis13} for example. \citeauthor{Antelis13} found correlations between hand movement and the low frequency signal of $(0.29,0.15,0.37)$ in the dimensions respectively.\\ + \cite{Lew14} state low frequencies are mainly involved in spontaneous, self-induced movement and can be found before the movement starts. Therefore, they may be a great possibility to lower reaction time of neuroprostheses for example. \subsection{EMG} - When using muscles, they are contracted after a signal via an efferent nerve activates them. Contraction of muscles also releases measurable energy, which is used for Electromyography (EMG). There are intramuscular applications of EMG but here only used surface EMG is used.\\ + When muscles are activated, they are contracted after a signal via an efferent nerve activates them. Contraction of muscles also releases measurable energy, which is used for Electromyography (EMG). There are intramuscular applications of EMG but here only used surface EMG is used.\\ From surface EMG, activity of muscles can be estimated, however not very precisely without repetition. Since the muscles used for arm movements are quite large in this setting, EMG allows relatively precise estimations of underlying muscle activity. - EMG is mainly developed for diagnostic tasks. However it is also applicable in science to track muscle activity as is done here. + EMG is mainly developed for diagnostic tasks. However, it is also applicable in science to track muscle activity as is done here. \section{Signal Processing} \subsection{Power estimation} \subsubsection{EEG} - To use data from EEG one way is to analyze the occurring frequencies and their respective power.\\ - To gain these from the continuous signal windows can be used in which the signal is finite and a Fourier transform can be estimated. For the estimation power spectral density (PSD) is used. + One way of using data from EEG is to analyze the occurring frequencies and their respective power.\\ + To gain these from the continuous signal, windows can be used, in which the signal is finite and a Fourier transform can be estimated. For the estimation power spectral density (PSD) is used. \subsubsection{Power spectral density estimation} The PSD is the power per frequency, where power refers to the square of the amplitude.\\ - If the Fourier transform is existing, PSD can be calculated from it e.g. as periodogram. If not it has to be estimated. One way to do so is fast Fourier transform (FFT), another - used here - is parametrized with an Autoregressive model (AR). For this one assumes that there is a relationship of the spectral density between $p$ consecutive samples and the following one. This leads to an equation with only $p$ parameters which can be estimated in different ways. Here Burg's method (\texttt{pburg} from \matlab{} library) is used.\\ + If the Fourier transform does exist, PSD can be calculated from it, e.g. as periodogram. If not, it has to be estimated. One way to do so is fast Fourier transform (FFT), another - used here - is by parametrization with an Autoregressive model (AR). For this, one assumes that there is a relationship of the spectral density between $p$ consecutive samples and the following one. This leads to an equation with only $p$ parameters which can be estimated in different ways. Here Burg's method (\texttt{pburg} from \matlab{} library) is used.\\ In Figure~\ref{fig:psd} we see the difference between autoregressive \texttt{pburg} and periodogram \texttt{pwelch} PSD estimation. \begin{figure} \includegraphics[width=\textwidth]{psd.png} @@ -67,26 +67,26 @@ \subsubsection{Burg's method - Autoregressive Model} \label{mat:burg} Burg's method (\cite{Burg75}) is a special case of parametric PSD estimation. It interprets the Yule-Walker-Equations as least squares problem and iteratively estimates solutions.\\ - According to \cite{Huang14} Burg's method fits well in cases with the need of high resolution.\\ - Burg and Levinson-Durbin algorithms are examples for PSD estimation which use an autoregressive model instead of Fast Fourier Transformation. The approach is described well by \cite{Spyers98}. The idea is to lower the number of parameters determining the production of the signal. The number of parameters used is called \textit{model order} (250 for this thesis, lower in most cases). These parameters are estimated from the original data. For PSD estimation the modeled values are used which allows easier transformation since the data is generated by an known process.\\ - Often the Rational Transfer Function Modeling is used having the general form of $$x_n=-\sum\limits_{k=1}^p a_kx_{n-k}+ \sum\limits_{k=0}^q b_ku_{n-k},$$ where $x_n$ is the output, $u_n$ the input. $a,b$ are the system parameters which have to be estimated from original data. If there is unknown input the output can only be estimated which simplifies the formula as follows $$\hat{x}_n=-\sum\limits_{k=1}^p a_k\hat{x}_{n-k}.$$ + According to \cite{Huang14} Burg's method works well in cases with the need of high resolution.\\ + Burg and Levinson-Durbin algorithms are examples for PSD estimation which use an autoregressive model instead of Fast Fourier Transformation. The approach is described well by \cite{Spyers98}. The idea is to lower the number of parameters determining the production of the signal. The number of parameters used is called \textit{model order} (250 for this thesis, lower in most cases). These parameters are estimated from the original data. For PSD estimation, the modeled values are used which allows easier transformation since the data is generated by a known process.\\ + Often the Rational Transfer Function Modeling is used in the general form of $$x_n=-\sum\limits_{k=1}^p a_kx_{n-k}+ \sum\limits_{k=0}^q b_ku_{n-k},$$ where $x_n$ is the output, $u_n$ the input. $a,b$ are the system parameters which have to be estimated from original data. If there is unknown input the output can only be estimated which simplifies the formula as follows $$\hat{x}_n=-\sum\limits_{k=1}^p a_k\hat{x}_{n-k}.$$ Estimating the parameters is done by minimizing the forward prediction error $E$: $$E=\frac{1}{N}\sum\limits_{i=1}^N \left(x_i-\hat{x}_i\right)^2$$ The minimum has zero slope and can be found by setting the derivative to zero:$$\frac{\partial E}{\partial a_k}=0,\text{ for } 1\le k\le p$$ This yields a set of equations called \emph{Yule-Walker-Equations} (cf. \cite{Yule27},\cite{Walker31}).\\ - Using forward and backward prediction the parameters ($a_k$) are estimated based on the Yule-Walker-Equations then. + Using forward and backward prediction the parameters ($a_k$) are then estimated based on the Yule-Walker-Equations. \subsection{Filtering} Filtering of the recorded EEG signal is necessary for different reasons. First there are current artifacts from 50Hz current. These can be filtered out with bandstop filters.\\ Secondly we need to concentrate on the interesting frequencies (for classical EEG 1-50Hz). This is done by applying lowpass or highpass filters respectively. This is necessary because the PSD of lower frequency is a lot higher than that of higher frequencies. The relation $$PSD(f)=\frac{c}{f^\gamma}$$ holds for constants $c$ and $\gamma$ (\cite{Demanuele07}).\\ - The Butterworth filter (\cite{Butterworth30}) was invented by Stephen Butterworth in 1930. Its advantage was uniform sensitivity to all wanted frequencies. In comparison to other filters Butterworth's is smoother because it is flat in the pass band and monotonic over all frequencies. This however leads to decreased steepness meaning a higher portion of frequencies beyond cutoff. + The Butterworth filter (\cite{Butterworth30}) was invented by Stephen Butterworth in 1930. Its advantage was uniform sensitivity to all wanted frequencies. In comparison to other filters, Butterworth's is smoother because it is flat in the pass band and monotonic over all frequencies. This, however, leads to decreased steepness meaning a higher portion of frequencies beyond cutoff. \section{Synergies} \label{back:synergies} - Movement of the arm (and other parts of the body) are under-determined meaning with given trajectory there are different muscle contractions possible. One idea how this problem could be solved by our nervous system are synergies. Proposed by Bernstein in 1967 (\cite{Bernstein67}) they describe the goal of the movement (e.g. the trajectory) instead of controlling single muscles. This would mean however that predicting the activity of single muscles from EEG is harder than predicting a synergy which in turn determines the contraction of muscles.\\ - Evidence for the use of synergies in the nervous system was found e.g. by Bizzi et al. (\cite{Bizzi08}) and Byadarhaly et al. (\cite{Byadarhaly12}). They also showed that synergies meet the necessary requirement to be able to build predictable trajectories.\\ + The movement of the arm (and other parts of the body) is under-determined meaning with given trajectory there are different muscle contractions possible. One idea how this problem could be solved by our nervous system are synergies. Proposed by Bernstein in 1967 (\cite{Bernstein67}), they describe the goal of the movement (e.g. the trajectory) instead of controlling single muscles. This would mean, however, that predicting the activity of single muscles from EEG is harder than predicting a synergy which in turn determines the contraction of muscles.\\ + Evidence for the use of synergies in the nervous system was found e.g. by Bizzi et al. (\cite{Bizzi08}) and Byadarhaly et al. (\cite{Byadarhaly12}). They also showed that synergies meet the necessary requirements to be able to build predictable trajectories.\\ Synergies are usually obtained from EMG signal through a principal component analysis (PCA, cf. \ref{mat:pca}), non-negative matrix factorization (NMF, cf. \ref{mat:nmf}) or autoencoders (a form of neuronal network, cf. \ref{mat:autoenc}). \subsection{Principal Component Analysis} \label{mat:pca} Principal Component Analysis (PCA) is probably the most common technique for dimensionality reduction. The idea is to use those dimensions with the highest variance to keep as much information as possible in the lower dimensional room.\\ - Invented PCA was in 1901 by Karl Pearson (\cite{Pearson01}). The intention was to get the line closest to a set of data. This line also is the one that explains most variance.\\ + PCA was invented in 1901 by Karl Pearson (\cite{Pearson01}). The intention was to get the line closest to a set of data. This line also is the one that explains most variance.\\ The PCA of data can be done in different ways. One is calculating the eigenvectors of the covariance matrix. The principal component is the eigenvector with the highest eigenvalue. Other eigenvectors follow ordered by their eigenvalues.\\ \begin{figure} \centering @@ -94,14 +94,14 @@ \caption{Gaussian Scatter with both eigenvectors, the principal component (long arrow) explaining most, the other least variance} \label{fig:pca} \end{figure} - In Figure~\ref{fig:pca} we see the eigenvectors of the data. The longer vector is the principal component the shorter one is orthogonal to it and explains the remaining variance. The second component here also is the component which explains least variance, since most variance is orthogonal to it. + In Figure~\ref{fig:pca} we see the eigenvectors of the data. The longer vector is the principal component, the shorter one is orthogonal to it and explains the remaining variance. The second component here also is the component which explains least variance, since most variance is orthogonal to it. \subsection{Non-Negative Matrix Factorization} \label{mat:nmf} - In some applications non-Negative Matrix Factorization (NMF) is preferred over PCA (cf. \cite{Lee99}). This is because it does not learn eigenvectors but decomposes the input into parts which are all possibly used in the input. When seen as matrix factorization PCA yields matrices of arbitrary sign where one represents the eigenvectors the other the specific mixture of them. Because an entry may be negative cancellation is possible. This leads to unintuitive representation in the first matrix.\\ - NMF in contrast only allows positive entries. This leads to \qq{what is in, is in} meaning no cancellation which in turn yields more intuitive matrices. The first contains possible parts of the data, the second how strongly they are represented in the current input.\\ + In some applications non-Negative Matrix Factorization (NMF) is preferred over PCA (cf. \cite{Lee99}). This is because it does not learn eigenvectors but decomposes the input into parts which are all possibly used in the input. When seen as matrix factorization PCA yields matrices of arbitrary sign where one represents the eigenvectors and the other the specific mixture of them. Because an entry may be negative cancellation is possible. This leads to unintuitive representation in the first matrix.\\ + NMF in contrast only allows positive entries. This leads to \qq{what is in, is in} meaning no cancellation which in turn yields more intuitive matrices. The first contains possible parts of the data, the second lists how strongly they are represented in the current input.\\ The formula for NMF is $$Input\approx \mathbf{WH}$$ - where Input is $n\times m$, $W$ is $n\times r$ and $H$ is $r\times m$ with $r<<\min\{m,n\}$. So $\mathbf{WH}$ is only an approximation of the input however with significant lower dimension - the number of Synergies used.\\ + where Input is $n\times m$, $W$ is $n\times r$ and $H$ is $r\times m$ with $r<<\min\{m,n\}$. So $\mathbf{WH}$ is only an approximation of the input, however, with significant lower dimension - the number of Synergies used.\\ The factorization is learned with an update rule that my be chosen.The \matlab{} default, an alternating least squares (ALS) algorithm, is used here. It can be described as in algorithm \ref{alg:als} (cf. \cite{Berry07}).\\ \begin{algorithm} \begin{algorithmic} @@ -119,12 +119,12 @@ \caption{Alternating Least Squares in NMF} \label{alg:als} \end{algorithm} - This version uses some simplifications (as setting to zero to be non-negative) and an slightly improved form is used in \matlab{}.\\ - ALS usually converges faster and with an better result than multiplicative update algorithms which would be the alternative in \matlab{}. + This version uses some simplifications (as setting to zero to be non-negative) and a slightly improved form is used in \matlab{}.\\ + ALS usually converges faster and with a better result than multiplicative update algorithms which would be the alternative in \matlab{}. \subsection{Autoencoders} \label{mat:autoenc} - Autoencoders are a specific type of artificial neural networks (ANN). They work like typical ANNs by adjusting weights between the layers however they do not predict an unknown output but they predict their own input. What is interesting now is manipulating the size of the hidden layer where the hidden layer has to be smaller than the input layer. Now in the hidden layer the information of the input can be found in a condensed form (e.g. synergies instead of single muscle activity).\\ - Autoencoders have been successfully used by Spüler et al. to extract synergies from EMG (\cite{Spueler16}). Especially with a lower number of synergies autoencoders should perform better than PCA or NMF because linear models fail to discover the agonist-antagonist relations that are typical for muscle movements. These however can be detected by autoencoders which allows for good estimations with half the synergies.\\ + Autoencoders are a specific type of artificial neural networks (ANN). They work like typical ANNs by adjusting weights between the layers, however, they do not predict an unknown output but their own input. What is interesting now is manipulating the size of the hidden layer where the hidden layer has to be smaller than the input layer. Now in the hidden layer the information of the input can be found in a condensed form (e.g. synergies instead of single muscle activity).\\ + Autoencoders have been successfully used by Spüler et al. to extract synergies from EMG (\cite{Spueler16}). Especially with a lower number of synergies, autoencoders should perform better than PCA or NMF because linear models fail to discover the agonist-antagonist relations that are typical for muscle movements. These, however, can be detected by autoencoders which allows for good estimations with half the synergies.\\ An autoencoder's input layer has as many neurons as there are input dimensions (e.g. one for each EMG channel). The number of hidden layer neurons may be varied, here usually 3 are used. The output layer is of the same size as the input layer. This autoencoder is shown in Figure~\ref{fig:autoenc}. \begin{figure} \centering @@ -134,33 +134,33 @@ \end{figure} \section{Machine Learning} \subsection{Support-Vector Machines} - Support-vector machines (SVMs) are used for classification of data. This is done by separating data in feature space by a hyperplane. Additional data is classified with respect to the site of the hyperplane it is located in feature space.\\ - This hyperplane is considered optimal if the margins on both sides (distance to the nearest data point) are maximal to allow for the maximal possible noise. This means the separating hyperplane can be constructed out of the nearest points (3 in 2-D) from both classes. This points however may be different for different attempts as an different angle in some dimension may make different points the nearest (cf. Figure~\ref{fig:hyperplanes}).\\ + Support-vector machines (SVMs) are used for classification of data. This is done by separating data in feature space by a hyperplane. Additional data is classified with respect to the side of the hyperplane that it is located on.\\ + This hyperplane is considered optimal if the margins on both sides (distance to the nearest data point) are maximal to allow for the maximally possible noise. This means the separating hyperplane can be constructed out of the nearest points (3 in 2-D) from both classes. These points however may be different for different attempts as an different angle in some dimension may make different points the nearest (cf. Figure~\ref{fig:hyperplanes}).\\ \begin{figure} \centering \includegraphics[width=0.6\textwidth]{pictures/hyperplanes.png} - \caption{Two possible hyperplanes while the orange one has larger margins} + \caption{Two sets (red and blue) separated possible hyperplanes while the orange one has larger margins} \label{fig:hyperplanes} \end{figure} - The hyperplane is defined as $\vec{w}\cdot \vec{x}-b=0$ while the hyperplanes tangent to the classes are $\vec{w}\cdot \vec{x}-b=1$ or $\vec{w}\cdot \vec{x}-b=-1$ respectively (see Figure~\ref{fig:svm}). The margin is $\frac{2}{||\vec{w}||}$. The margins have to be maximized while all data is classified correctly, so the problem can be formulated as: + The hyperplane is defined as $\vec{w}\cdot \vec{x}-b=0$ while the hyperplanes tangential to the classes are $\vec{w}\cdot \vec{x}-b=1$ or $\vec{w}\cdot \vec{x}-b=-1$ respectively (see Figure~\ref{fig:svm}). The margin is $\frac{2}{||\vec{w}||}$. The margins have to be maximized while all data is classified correctly, so the problem can be formulated as: $$\min \frac{2}{||\vec{w}||}\ s.t.\ y_i(\vec{w}\cdot\vec{x_i}-b)\geq 1,\ i=1,\dots,n,$$ where $y_i$ is the class (+1 or -1) of the corresponding data. \begin{figure} \centering \includegraphics[width=0.6\textwidth]{pictures/svm.png} - \caption{Margins and hyperplane for a SVM (Figure by Cyc and Peter Buch)} + \caption{Margins and hyperplane for an SVM (Figure by Cyc and Peter Buch)} \label{fig:svm} \end{figure} - This prototype of a SVM is only able to separate two classes of linear separable data. For other data some improvements are necessary. + This prototype of an SVM is only able to separate two classes of linearly separable data. For other data some improvements are necessary. \subsubsection{Multiclass SVM} If there are more than two classes to separate it can be done with SVMs in different ways. One approach is \emph{one-vs-one} meaning all classes are compared with the according SVM and the SVM votes for one or the other class. This is done for all pairs and the class with most votes is picked.\\ Another approach is \emph{one-vs-all} where every class is compared against the remaining. Here scores are used to determine which class matches best, i.e. in which class the data is farthest from the separating hyperplane. \subsubsection{Soft-margin SVM} If data is not separable soft-margin SVMs have to be used. They allow wrong classification but try to minimize them. The problem can be formulated as $$\text{Minimize }\frac{1}{N}\sum\limits_{i=1}^N\max\{0,1-y_i(\vec{w}\cdot\vec{x_i}-b)\}+\lambda ||\vec{w}||^2,$$ - where $\lambda$ is the parameter that adjusts the trade-off between large margins and wrong classifications (if $\lambda$ has an higher value, there is more weight on large margins). + where $\lambda$ is the parameter that adjusts the trade-off between large margins and wrong classifications (if $\lambda$ has a higher value, there is more weight on large margins). \subsubsection{Kernel trick} - Data like in figure~\ref{fig:kernel} are not \emph{linear} separable. The idea here is to apply the \emph{kernel trick} meaning to separate the data in a higher dimensional space where they are linear separable. In the example this is accomplished by using the distance from origin as feature and separating in that space. + Data like those in figure~\ref{fig:kernel} are not \emph{linearly} separable. The idea here is to apply the \emph{kernel trick} meaning to separate the data in a higher dimensional space where they are linear separable. In the example this is accomplished by using the distance from origin as feature and separating in that space. %TODO \begin{figure} \input{pictures/kernel.tikz} \caption{Data separable with the kernel trick; left in the original space with features $x$ and $y$, right in the dimension where distance from the origin is shown and the data is linear separable} @@ -168,24 +168,24 @@ \end{figure} Common kernels are polynomial, Gaussian and hyperbolic kernels. \subsection{Regression} - Regression is the idea of finding $\beta$ so that $$y= X\beta+\epsilon$$ where X is the $n\times p$ input matrix and y the $n\times 1$ output vector of a system. Having this $\beta$ from given input the output can be predicted.\\ - There are different ways to find this $\beta$. One common approach is the \emph{ordinary least squares}-Algorithm. $$\hat{\beta}=\arg\min\limits_{b\in\mathds{R}^p} \left(y-Xb\right)^T\left(y-Xb\right),$$ meaning the chosen $\hat\beta$ is that $b$ which produces the lowest error since $Xb$ should be - besides from noise $\epsilon$ - the same as $y$.\\ + Regression is the idea of finding $\beta$ so that $$y= X\beta+\epsilon$$ where X is the $n\times p$ input matrix and y the $n\times 1$ output vector of a system. Using this $\beta$, from given input, the output can be predicted.\\ + There are different ways to find this $\beta$. One common approach is the \emph{ordinary least squares}-Algorithm. $$\hat{\beta}=\arg\min\limits_{b\in\mathds{R}^p} \left(y-Xb\right)^T\left(y-Xb\right),$$ meaning the chosen $\hat\beta$ is that $b$ which produces the lowest error since $Xb$ should - apart from noise $\epsilon$ - be the same as $y$.\\ Choosing $\beta$ like this brings the risk of overfitting. The training data is matched well, new data points however are classified poorly. This problem is addressed with RIDGE-Regression. \subsubsection{RIDGE-Regression} \label{mm:ridge} Instead of minimizing only the error in RIDGE-Regression (also called \emph{Tikhonov regularization}) the size of vector $\beta$ is also minimized: $$\hat{\beta}=\arg\min\limits_{b\in\mathds{R}^p} \left(y-Xb\right)^T\left(y-Xb\right)+\lambda b^Tb.$$ $\lambda$ decides how big the influence of the size of $\beta$ or $b$ respectively is. \subsection{Cross Validation} - $k$-fold cross validation means splitting the data into $k$ equally sized parts, training the model on $k-1$ parts and validating on left one (see Figure~\ref{fig:crossValidation}). + $k$-fold cross validation means splitting the data into $k$ equally sized parts, training the model on $k-1$ parts and validating on the remaining one (see Figure~\ref{fig:crossValidation}). \begin{figure} \includegraphics[width=\textwidth]{pictures/K-fold_cross_validation_EN.jpg} \caption{Principle of k-fold cross validation (here $k=4$)(picture by Joan.domenech91 and Fabian Flöck)} \label{fig:crossValidation} \end{figure} This is done to achieve a measure for how good the fit is. When using cross validation all the data is used for prediction and is predicted. This eliminates effects of randomness.\\ - The reason for e.g. 10-fold cross validation is that a lot of data ($9/10$) is left for the prediction. If there is enough data one may want to use 2-fold cross validation only to lower computation times. + The reason for e.g. 10-fold cross validation is that a lot of data ($9/10$) is left for the prediction. If there is enough data, one may want to use 2-fold cross validation only to lower computation times. - Cross validation can also be nested. This is necessary for example if it is used for parameter optimization and to measure fitness or if there is more than one parameter. When nesting computation time grows exponentially in the number of nested cross validation steps (see Algorithm~\ref{alg:cv}) + Cross validation can also be nested. This is necessary for example if it is used for parameter optimization and to measure fitness or if there is more than one parameter. When nesting, computation time grows exponentially with the number of nested cross validation steps (see Algorithm~\ref{alg:cv}) \begin{algorithm} \begin{algorithmic} \State data $\gets$ data$\{0$...$9\}$ @@ -211,7 +211,7 @@ \section{Evaluation Methods} \subsection{Confusion Matrix} \label{mm:cm} - The confusion matrix is a visualization of classifications. In it for every class the number of samples classified as each class is shown. This is interesting since it can show bias and give a feeling for similar cases where similar is meant according to the features.\\ + The confusion matrix is a visualization of classifications. In it, for every class, the number of samples classified as each class is shown. This is interesting, as it can show bias and give a feeling for similar cases where similar is meant according to the features.\\ In the 2-class case the well known table of true and false positives and negatives (table~\ref{tab:tptnftfn}) is a confusion matrix. From it we can learn specificity and sensitivity as follows: $$\text{sensitivity}=TP/(TP+FP)$$ $$\text{specificity}=TN/(TN+FN)$$ @@ -228,7 +228,7 @@ \caption{2D confusion matrix} \label{tab:tptnftfn} \end{table} - In the higher dimensional case \matlab{} uses color coded maps as figure~\ref{fig:exampleCM}. In this thesis scaled confusion matrices are used where each row adds up to 1. + In the higher dimensional case \matlab{} uses color coded maps as seen in figure~\ref{fig:exampleCM}. In this thesis scaled confusion matrices are used where each row adds up to 1. \begin{figure} \includegraphics[width=\textwidth]{pictures/results/cmEEGfull.png} \caption{Example for a color coded confusion matrix with 5 classes} @@ -236,7 +236,7 @@ \end{figure} \subsection{ANOVA} Analysis of Variance (ANOVA) is a way of checking if there is a main effect of a variable.\\ - The Hypotheses tested are that all group means are equal ($H_0$) or they are not ($H_1$). To check on those ANOVA compares the deviation from the over-all mean and compares it to the deviation within the groups. If a lot of variance in the data can be explained by the groups (meaning in-group variance is lower than variance between groups) it is quite likely that the proposed groups have different means.\\ + The hypotheses tested are that all group means are equal ($H_0$) or they are not ($H_1$). To check on those ANOVA compares the deviation from the over-all mean and compares it to the deviation within the groups. If a lot of variance in the data can be explained by the groups (meaning in-group variance is lower than variance between groups) it is quite likely that the proposed groups have different means.\\ Whether this is significant is decided based on the $p$-Value representing the probability that the difference between in-group and between-group variance is even higher. $H_0$ is rejected if $p$ is lower than a defined threshold (often $0.05$, $0.01$ or $0.001$). \subsection{Boxplot} To plot data and show their distribution boxplots are used. @@ -259,12 +259,11 @@ \end{figure} Of the kinematic information tracked only position ($x,y$) and angle ($\theta$, rotation around $z$-axis) of the hand are used.\\ Only complete sessions were used for this analyses to ensure better comparability.\\ - One session consists of 5 runs with 40 trials each. The trials were separated by resting phases of varying length (2-3s, randomly assigned). Each trial began with an auditory cue specifying the random but equally distributed target for this trial. This leads to 50 reaches to the same target each session. - After the auditory cue the participants should \qq{perform the movement and return to the starting position at a comfortable pace but within 4 seconds} (\cite{Shiman15})\\ - For each subject there were 4 to 6 sessions, each recorded on a different day. All in all there were 255 runs in 51 sessions. Each session was analyzed independently as one continuous trial. - \subsection{Environment for evaluation} - The calculations were done on Ubuntu \texttt{14.04 / 3.19.0-39} with \matlab{} \texttt{R2016a (9.0.0.341360) 64-bit (glnxa64) February 11, 2016}. + One session consists of 5 runs with 40 trials each. The trials are separated by resting phases of varying length (2-3s, randomly assigned). Each trial is a grasp to one of four targets and begins with an auditory cue specifying the random but equally distributed target for this trial. This leads to 50 reaches to the same target each session.\\ + After the auditory cue the participants should \qq{perform the movement and return to the starting position at a comfortable pace but within 4 seconds} (\cite{Shiman15}).\\ + For each subject there are 4 to 6 sessions, each recorded on a different day. All in all there are 255 runs in 51 sessions. Each session is analyzed independently as one continuous trial. \section{Data Acquisition} + All the processing was done on Ubuntu \texttt{14.04 / 3.19.0-39} with \matlab{} \texttt{R2016a (9.0.0.341360) 64-bit (glnxa64) February 11, 2016}. \subsection{Loading of data} The data recorded with BCI2000 (\cite{Schalk04}) can be loaded into \matlab{} with a specific \texttt{.mex} file. The according \texttt{.mex}-Files for some platforms (Windows, MAC, Linux) are available from BCI2000 precompiled.\\ The signal plus the according status data and the parameters is loaded as shown in Algorithm~\ref{alg:load_bcidat}). @@ -278,7 +277,7 @@ \label{alg:load_bcidat} \end{algorithm} \subsubsection{Signal} - The signal is loaded as matrix of 41 channels (see Table~\ref{tab:channelNames}). All the values are integers corresponding to the voltage and also can be loaded as floating point values representing microvolts. Since the representation should not make any difference when analyzing the spectrum the smaller representation as integers is used here. + The signal is loaded as a matrix of 41 channels (see Table~\ref{tab:channelNames}). All the values are integers corresponding to the voltage and can also be loaded as floating point values representing microvolts. Since the representation should not make any difference when analyzing the spectrum, the smaller representation as integers is used here. \begin{table} \centering \begin{tabular}{c|c|l} @@ -330,10 +329,10 @@ \label{tab:channelNames} \end{table} \subsubsection{States} - The main information contained by the \texttt{states} \matlab{}\texttt{-struct} is the currently presented stimulus. The \texttt{struct} has same length as the signal so that for every entry in the signal there is corresponding state information.\\ - There were some adjustments necessary since it did not match the movements (cf. Section~\ref{mm:newClass}). + The main information contained by the \texttt{states} \matlab{}\texttt{-struct} is the currently presented stimulus. The \texttt{struct} has the same length as the signal so that for every entry in the signal there is corresponding state information.\\ + There were some adjustments necessary since the states did not match the movements (cf. Section~\ref{mm:newClass}). \subsubsection{Parameters} - All the settings from the BCI2000 recording are saved in and loaded from the \texttt{parameters}.\\ + All the settings from the BCI2000 recording are saved and loaded as \texttt{parameters}.\\ Examples are the names of the channels, the random seed for BCI2000 and the sounds, meaning and duration for different stimuli. \subsection{Filtering} For filtering I use second order Butterworth filters. @@ -355,7 +354,7 @@ \end{enumerate} For filtering \matlab{}s \texttt{filtfilt} function is used to reduce shift due to multiple filtering steps. \subsection{Windowing} - To process continuous EEG or EMG data it is necessary to define windows. These windows are shifted over the data to get discrete values for the further steps.\\ + To process continuous EEG or EMG data it is necessary to define windows. These windows are shifted over the data to obtain discrete values for the further steps.\\ Defaults for EEG are: \begin{itemize} \item window $1 s$ @@ -375,12 +374,13 @@ \subsection{Classification} \label{mm:newClass} Very bad results when classifying EMG into Move/Rest made me further inspect the data. The actual movement came quite a while after the stimulus.\\ - To address this problem I did a re-classification of the data according to actual movements (cf. Appendix~\ref{code:classify}). To decide whether the subject is moving or not the mean EMG activity (from Waveform Length) is compared to a threshold (10,000 by default).\\ - If there is movement the class occurring most in the second before is defined as the current task. If there is movement but the stimulus tells to rest, the last active stimulus is assigned.\\ - In addition the second before movement onset is taken out of the data (classified as -1) and (optionally) half a second before movement onset as belonging to the following stimulus (cf. \ref{mat:pause}).\\ - Finally some smoothening is done by taking the most occurring class one second before to one second after the current time step as its class.\\ + To address this problem I did a re-classification of the data according to actual movements (cf. Appendix~\ref{code:classify}).\\ + To decide whether the subject is moving or not the mean EMG activity (from Waveform Length) is compared to a threshold (10,000 by default).\\ + If there is movement, the class occurring the most in the second before is defined as the current task. If there is movement but the stimulus indicates resting, the last active stimulus is assigned.\\ + In addition the second before movement onset is removed from the data (classified as -1) and (optionally) half a second before movement onset as belonging to the following stimulus (cf. \ref{mat:pause}).\\ + Finally some smoothing is done by taking the most frequently occurring class from the second before and after the current time step as its class.\\ As last step the length of the stimulus-vector is adjusted to the length of the EEG data.\\ - According to this classification only data points are taken in the further analysis that are classified different than -1 meaning they are either clear rest or clear movement. + According to this classification only data points are taken in the further analysis that are classified differently to -1 meaning they are either clear rest or clear movement. \subsection{Synergies} \label{mat:synergies} Synergies are generated based on different options for dimensionality reduction (cf. \ref{back:synergies}).\\ @@ -388,9 +388,9 @@ Using PCA this is done by taking the first $n$ components. Then the EMG data is transformed into the $n$-dimensional space spanned by the components.\\ NMF is done with $n$ as inner dimension. Then EMG data is multiplied with the resulting matrix to transform it to $n$-dimensional data.\\ Autoencoders eventually are trained with a hidden layer of size $n$ and afterwards EMG data is encoded with the learned weights. This is equivalent to taking the hidden layer activity for the corresponding time step.\\ - Since synergies are generated from EMG they have the same dimensionality in the first dimension\footnote{only depending on window size and shift for EMG data and the recoding duration} and $n$ in the second. + Since synergies are generated from EMG, they have the same dimensionality in the first dimension\footnote{only depending on window size and shift for EMG data and the recoding duration} and $n$ in the second. \subsection{Kinematics} - Kinematic data is used either as movement or as position. The position was directly recorded, the movement is the first derivative of the position in time.\\ + Kinematic data is used either as movement or as position. The position was directly recorded, the movement is calculated as the first derivative of the position in time.\\ The recording of kinematics was started after that of EEG. In synchronization channel\footnote{cf. Table~\ref{tab:channelNames}} there is a peak when kinematic recording is started. This was used to align movement with EEG and EMG data. In addition the kinematic data is adjusted to the EMG window and shift to be able to use corresponding data for the same time step. This was done by summing all differences (for movement) or by calculating the mean position in the time window.\\ Size of this data is same as EMG and Synergies in length but has only three features per time step since only 3D positioning ($x,y$ and $\theta$) of the hand and no information about the fingers are used. \section{Data Analysis} @@ -403,7 +403,7 @@ \end{figure} \subsection{Classification} In addition to the regressions, classifications were done to have a benchmark and a possibility to compare with results from other work. - Classification can be done in different ways. First approach is discriminating Movement from Rest. This is done by training an SVM and testing its results with 10-fold cross validation. Here this is done with EMG, EEG and LF data. EMG in this setting is trivial since it was the basis for the classification (cf. \ref{mm:newClass}).\\ + Classification can be done in different ways. The first approach is discriminating Movement from Rest. This is done by training an SVM and testing its results with 10-fold cross validation. Here this is done with EMG, EEG and LF data. EMG in this setting is trivial since it was the basis for the classification (cf. \ref{mm:newClass}).\\ In a second step discrimination of movement in different directions is done, also with an SVM trained on EMG, EEG or LF data respectively. The fit of the model is also checked with 10-fold cross validation.\\ For unbiased classification it is necessary to train with equally sized classes. For that purpose and to lower computation time only 250 (as default) samples per class are taken in.\\ The parameter $c$ for the support vector machine is found with an additional step of cross validation or set to 1. (Results in Section~\ref{res:maxC}).\\ @@ -412,9 +412,9 @@ \subsection{Regression} \subsubsection{Predicting Kinematics} The prediction of kinematics is done with ridge regression. Since there are more data for kinematics than for EEG the mean position or movement are used and predicted.\\ - The regression is done in 10-fold cross validation for each dimension ($x,y,\theta$) and the parameter $\lambda$ (cf. ~\ref{mm:ridge}) is ascertained with an additional cross validation. The resulting correlation is the mean correlation of each of the 10 parts with the best parameter lambda each while the correlation for each dimension is calculated independently. + The regression is done in 10-fold cross validation for each dimension ($x,y,\theta$) and the parameter $\lambda$ (cf. ~\ref{mm:ridge}) is ascertained with an additional cross validation. The resulting correlation is the mean correlation of each of the 10 parts with the best parameter $\lambda$ each while the correlation for each dimension is calculated independently. \subsubsection{Predicting Synergies} - Predicting synergies works similar as for the kinematics. Only change is that the synergies may have other dimensionality. Nevertheless each synergy is predicted from all EEG data as one output and correlation is calculated for each synergy. + Predicting synergies works similar as for the kinematics. The only difference is that the synergies may have other dimensionality. Nevertheless each synergy is predicted from all EEG data as one output and correlation is calculated for each synergy. \subsubsection{Predicting EMG} When predicting EMG data the sum of the waveform length in the time corresponding to the EEG data is used. As the EMG data was summed to gain the data this is a reasonable approach.\\ The remaining steps are the same as for kinematics and Synergies. diff --git a/text/thesis/03Results.tex b/text/thesis/03Results.tex index 35bf9de..8ffc3d4 100644 --- a/text/thesis/03Results.tex +++ b/text/thesis/03Results.tex @@ -12,8 +12,8 @@ When calculating an Analysis of Variance (ANOVA) on the data with and without pause we get $p<0.001$. \subsubsection{Confusion Matrix} - A confusion matrix shows whether there is systematic error in classification.\\ - In figure \ref{fig:cmEMG} there is the confusion matrix for EMG data. Since EMG works well for classifying Move/Rest there is also one where only the decision which movement is present is shown. In the second plot we see that many movements are classified as class 3. Especially those belonging to class 2. + A confusion matrix shows whether there is a systematic error in the classification.\\ + Figure \ref{fig:cmEMG} shows the confusion matrix for EMG data. Since EMG works well for classifying Move/Rest there is also one where only the decision which movement is present is shown. In the second plot we see that many movements, especially those belonging to class 2, are classified as class 3. \begin{figure}[p] \centering \includegraphics[width=\textwidth]{pictures/results/cmEMGfull.png} @@ -22,15 +22,15 @@ \label{fig:cmEMG} \end{figure} \subsection{Regression} - Using an offset or not does not make any difference since the offset is only applied on EEG-data (cf. \ref{mat:offset}).\\ + Whether an offset is used or not does not make any difference since the offset is only applied to EEG-data (cf. \ref{mat:offset}).\\ Predicting synergies from EMG does not make sense since they are computed from EMG (cf. \ref{mat:synergies}).\\ Predictions of velocities and positions are quite bad from EMG. - The prediction of the $y$-dimension is a bit better than $x$ ($p<0.05$) for velocities. For positions there is no significant difference ($p>0.1$). Predicting $\theta$ is worse significantly ($p<0.001$) for positions and velocities (also see tables \ref{tab:corrKin} and \ref{tab:corrPos}). + The prediction of the $y$-dimension is slightly better than $x$ ($p<0.05$) for velocities. For positions there is no significant difference ($p>0.1$). Predicting $\theta$ is worse significantly ($p<0.001$) for positions and velocities (also see tables \ref{tab:corrKin} and \ref{tab:corrPos}). There is no significant effect of the use of a pause when predicting velocities from EMG ($p>0.1$). \section{EEG} \subsection{Classification} - In figure~\ref{fig:overviewEEG} the different settings for classification based on EEG-data are shown. Default has values as in \ref{mat:default}. The runs with pause leave out the data 1 second before the movement begins (cf. \ref{mat:pause}). Runs with offset have an offset of 1 or 2 (cf. \ref{mat:offset}). + Figure~\ref{fig:overviewEEG} shows the different settings for classification based on EEG-data. Default has values as in \ref{mat:default}. The runs with pause leave out the data 1 second before the movement begins (cf. \ref{mat:pause}). Runs with offset have an offset of 1 or 2 (cf. \ref{mat:offset}). \begin{figure} \centering \includegraphics[width=\textwidth]{pictures/results/overviewEEGclass.png} @@ -38,7 +38,7 @@ \label{fig:overviewEEG} \end{figure} \subsubsection{Confusion Matrix} - In figure \ref{fig:cmEEGFull} there is the confusion matrix for EEG. It shows a main diagonal with relatively high values, the right class is chosen more often than other classes. + Figure \ref{fig:cmEEGFull} shows the confusion matrix for EEG. It shows a main diagonal with relatively high values, the right class is chosen more often than other classes. \begin{figure}[p] \centering \includegraphics[width=\textwidth]{pictures/results/cmEEGfull.png} @@ -51,7 +51,7 @@ \label{res:offsetEEG} Offset makes no significant difference when predicting Synergies (Autoencoder: $p>0.1$, PCA: $p>0.1$, NMF: $p>0.1$) or velocities ($p>0.1$) or positions ($p>0.1$). \subsubsection{Pause} - Whether there is a pause of 1s or only 0.5s doesn't make a significant difference for Autoencoder ($p>0.1$), PCA ($p>0.1$), NMF ($p>0.1$) or Velocities ($p>0.1$). + Whether there is a pause of 1s or only 0.5s makes no significant difference for Autoencoder ($p>0.1$), PCA ($p>0.1$), NMF ($p>0.1$) or Velocities ($p>0.1$). \subsubsection{EMG} For comparison also EMG was predicted from EEG. The results are shown in figure \ref{fig:EEGemg}. There are no significant differences between the channels ($p>0.1$). \begin{figure} @@ -70,7 +70,7 @@ \label{fig:overviewLF} \end{figure} \subsubsection{Confusion Matrix} - In figure \ref{fig:cmLFFull} there is the confusion matrix for LF. It shows a main diagonal with relatively high values, the right class is chosen more often than other classes. However there are also quite high values for Rest as class. + Figure \ref{fig:cmLFFull} shows the confusion matrix for LF. It shows a main diagonal with relatively high values, the right class is chosen more often than other classes. However there are also quite high values for Rest as class. \begin{figure}[p] \centering \includegraphics[width=\textwidth]{pictures/results/cmLFfull.png} @@ -102,7 +102,7 @@ \caption{EEG, EMG and LF compared based on classification accuracy with 5 classes} \label{fig:classEEGemgLF} \end{figure} - The mean classification accuracys for the default run are are given in Table~\ref{tab:accs}. + The mean classification accuracies for the default run are are given in Table~\ref{tab:accs}. \begin{table} \centering \begin{math} @@ -115,7 +115,7 @@ min&35.7&37.2&26.2 \end{array} \end{math} - \caption{Accuracys in \% for the different methods of recording in default configuration} + \caption{Accuracies in \% for the different methods of recording in default configuration} \label{tab:accs} \end{table} \subsection{Regression} @@ -207,8 +207,8 @@ TODO%TODO \subsection{RIDGE-Regression} In tables \ref{tab:ridgeParamEMGkin}, \ref{tab:ridgeParamHighKin} and \ref{tab:ridgeParamAO6Kin} we find the number of 'wins' for each parameter\footnote{\ref{tab:ridgeParamHighKin} and \ref{tab:ridgeParamAO6Kin} were calculated with a order for Burg's method of 50 instead of the later default of 250}. A 'win' refers to a run where this $\lambda$ scored the highest correlation.\\ - For EMG there is no clear preference but it seems like 100 should work as parameter. For EEG we see a clear preference for $\lambda=100$. Low Frequencies seem to prefer a lower parameter about 10 however this was only evaluated for one session. - For all other runs $\lambda = 100$ is used for all methods, better results might be possible with a parameter adapted better. + For EMG there is no clear preference but it seems like 100 should work as parameter. For EEG we see a clear preference for $\lambda=100$. Low Frequencies seem to prefer a lower parameter of about 10 however this was only evaluated for one session. + For all other runs $\lambda = 100$ is used for all methods, better results might be possible with a better adapted parameter. \begin{table} \centering \begin{math} @@ -238,15 +238,20 @@ \begin{table} \centering \begin{math} + \hfill\begin{array} + {r||c|c|c|c} + \lambda&0.001 & 0.01 & 0.1 & 1 \\\hline + EEG & 0&0&0& 30\\ + \end{array}\hfill \begin{array} - {r||c|c|c|c||c|c|c|c} - \lambda&0.001 & 0.01 & 0.1 & 1 & 1 & 5 & 10 & 100\\\hline - EEG & 0&0&0& 30 & 0 & 0 & 1 & 29\\ - EMG & & & & & 7 & 9 & 10 & 4\\ - LF& & & & & 1 & 13 & 14 & 2 - \end{array} + {r||c|c|c|c} + \lambda& 1 & 5 & 10 & 100\\\hline + EEG & 0 & 0 & 1 & 29\\ + EMG & 7 & 9 & 10 & 4\\ + LF& 1 & 13 & 14 & 2 + \end{array}\hfill \end{math} - \caption{Number of sessions in which the according $\lambda$ was chosen as best parameter when doing ridge regression to predict velocities from EEG, EMG or LF (run on AO6 only)\\ + \caption{Number of sessions in which the according $\lambda$ was chosen as best parameter when doing ridge regression to predict velocities from EEG, EMG or LF (run on one session with one subject (subject AO, session 6) only)\\ Low ($\lambda\le 1$) values were only tested for EEG in a separate run} \label{tab:ridgeParamAO6Kin} \end{table} @@ -254,15 +259,15 @@ \subsection{Number of Synergies} \label{res:noSyn} To determine the number of synergies to use, all EMG data is predicted with each technique and each number of synergies from itself. The result is the plot in figure~\ref{fig:noSyn}.\\ - The plot tells that 2 and 4 synergies are good values for Autoencoders, for default nevertheless 3 synergies are used here because there are also 3 dimensions of kinematics and so it is more comparable. Three is also the most efficient number of Synergies for PCA and NNMF (cf. Section \ref{dis:noSyn}).\\ + The plot shows that 2 and 4 synergies are good values for Autoencoders, for default, nevertheless, 3 synergies are used here because there are also 3 dimensions of kinematics and so it is more comparable. 3 is also the most efficient number of Synergies for PCA and NNMF (cf. Section \ref{dis:noSyn}).\\ \begin{figure} \centering \includegraphics[width=\textwidth]{pictures/results/noSyn.png} \caption{Self prediction accuracy of EMG with 1 to 6 synergies. Each channel of EMG and the mean performance is shown. We see a lowering of the slope at 2 and 4 synergies for Autoencoders and at 3 synergies for PCA and NMF} \label{fig:noSyn} \end{figure} - When comparing the results of prediction via different number of synergies, 2 synergies perform significantly ($p<0.01$) worse than 3 and 4. Between 3 and 4 synergies there is no significant difference ($p>0.1$).\\ - For each method of synergy generation alone the performance of 2 synergies is not significantly ($p>0.05$) worse. Only the over-all performance with more data becomes significant. + When comparing the results of prediction via different numbers of synergies, 2 synergies perform significantly ($p<0.01$) worse than 3 and 4. Between 3 and 4 synergies there is no significant difference ($p>0.1$).\\ + For each method of synergy generation alone, the performance of 2 synergies is not significantly ($p>0.05$) worse. Only the over-all performance with more data becomes significant. \subsection{Autoencoder} In table~\ref{tab:corrAutoenc} the correlations for velocities and positions predicted from Autoencoder are given. The data for the Autoencoder were calculated from recorded EMG data. \begin{table} @@ -281,7 +286,7 @@ \label{tab:corrAutoenc} \end{table} \subsubsection{Comparison with EMG} - When compared to the original 6D EMG data as a predictor a 3D autoencoder is only significantly worse when predicting positions ($p<0.05$), not for velocities ($p>0.1$). + When compared to the original 6D EMG data as a predictor a 3D autoencoder is only significantly worse at predicting positions ($p<0.05$), not for velocities ($p>0.1$). \begin{figure} \includegraphics[width=\textwidth]{pictures/results/EMGautoencPos.png} \caption{Predicting positions from EMG (left) or Autoencoder (right)} @@ -306,7 +311,7 @@ \label{fig:directViaPos} \end{figure} \subsubsection{EMG} - There is a significant difference between predicting EMG from EEG directly or via Autoencoders ($p<0.001$, see figure~\ref{fig:directViaEMG}). The prediction via Autoencoders performs a bit worse (mean $r\sim 0.03$). + There is a significant difference between predicting EMG from EEG directly or via Autoencoders ($p<0.001$, see figure~\ref{fig:directViaEMG}). The prediction via Autoencoders performs slightly worse (mean correlation $0.23$ (EMG) vs. $0.20$ (Autoencoder)). \begin{figure} \centering \includegraphics[width=\textwidth]{pictures/results/predictEMGfromEEG.png} diff --git a/text/thesis/04Discussion.tex b/text/thesis/04Discussion.tex index 7a3b2d1..ebd36fe 100644 --- a/text/thesis/04Discussion.tex +++ b/text/thesis/04Discussion.tex @@ -2,56 +2,56 @@ \label{chp:dis} \section{EMG} \label{dis:emg} - Predictions of velocities and positions are quite bad from EMG. A correlation about $0.2$ cannot be used for a BCI. There might be some way of improving the predictions and it might be predicted several times to heighten the correlations, finding another approach however is more promising.\\ + Predictions of velocities and positions are quite bad from EMG. A correlation about $0.2$ cannot be used for a BCI. There might be some way of improving the predictions and it might be predicted several times to improve the correlations, finding another approach however is more promising.\\ Additionally in many cases a BCI is needed, there are no EMG signals since the muscles do not work as they should (e.g. after stroke) and so do not generate the corresponding activity. - Out of these reasons I only use EMG as benchmark for other approaches: If the muscles would work this correlation could be reached. + Because of these reasons I only use EMG as benchmark for other approaches: If the muscles would work this correlation could be reached. \section{EEG} \label{dis:eeg} Predictions from EEG to velocities and position are significantly better than those from EMG (see tables \ref{tab:pCorr},~\ref{tab:corrKin},~\ref{tab:pCorrPos} and \ref{tab:corrPos}).\\ - This might be because EMG has a hard time classifying the different movements due to massive activity while moving. This can be seen in the confusion matrix (\ref{fig:cmEMG}). Many data points belonging to class 2 or 4 are classified as 3 in error. The classification between movement and rest however works fine.\\ + This might be because EMG has a hard time classifying the different movements due to massive activity while moving. This can be seen in the confusion matrix (\ref{fig:cmEMG}). Many data points belonging to class 2 or 4 are classified as 3 by mistake. The classification between movement and rest however works fine.\\ All in all few samples are classified as class 2 even though the training was done on a balanced set. This could mean that features of class 2 are found in other classes too and by that do not have strong predictive power. - The classification results I found for EEG are similar to them of \cite{Shiman15}. They found a classification accuracy of 39.5\%, I have a mean classification of 40.4\% and for targets only (no rest) even 43.6\%. However my findings for discrimination of movement and rest in EEG are a lot worse ($64.6\%$ vs. $38.5\%$). This is probably due to the training of the SVM since when only discriminating between movement and rest my setup also scores $57.1\%$. Higher frequencies with more muscle artifacts (cf. \ref{res:topo} and \ref{dis:topo}) are more predictive for the difference between movement or rest however less predictive for the different classes. So the SVM when trained to distinguish the classes gives higher predictive power to the lower frequencies. + The classification results I found for EEG are similar to those of \cite{Shiman15}. They found a classification accuracy of 39.5\%, I have a mean classification of 40.4\% and for targets only (no rest) even 43.6\%. However my findings for discrimination of movement and rest in EEG are a lot worse ($64.6\%$ vs. $38.5\%$). This is probably due to the training of the SVM since when only discriminating between movement and rest my setup also scores $57.1\%$. Higher frequencies with more muscle artifacts (cf. \ref{res:topo} and \ref{dis:topo}) are more predictive for the difference between movement or rest, however less predictive for the different classes. So the SVM when trained to distinguish the classes gives higher predictive power to the lower frequencies. When predicting velocities or positions from EEG there is no significant difference between $x$ and $y$. The difference between $x$ and $y$ and the angle $\theta$ is larger for velocities than for absolute positions since the absolute angle prediction is a lot better than the prediction of change.\\ - This again is an indication that the actual position is more important for the activity in brain than the change of position as itself. + This again is an indication that the actual position is more important for the activity in brain than the change of position itself. \section{Low Frequencies} \label{dis:lf} My findings concerning low frequencies are a lot less promising than e.g. in \cite{Lew14}.\\ - The reason for that might be that the movements were not self induced but extrinsically motivated by a cue. \citeauthor{Lew14} however use low frequencies exactly for the purpose to detect voluntary movement.\\ + The reason for that might be that the movements were not self induced but extrinsically motivated by a cue. \citeauthor{Lew14} however use low frequencies exactly for the purpose of detecting voluntary movement.\\ I show that the use of low frequencies (at least as I did it here) has no advantage over the use of EMG (see table \ref{tab:pCorr}). This might also be a hint that movement artifacts have the biggest part in low frequencies while moving. This however makes it impossible to use them for continuous tasks.\\ - Low frequencies are great to early detect voluntary movement but are not applicable in this configuration. + Low frequencies are great to detect voluntary movement early but are not applicable in this configuration. - Which is interesting nevertheless is, that low frequencies also occur in rest. Quite some of the movements are classified as rest (see figure \ref{fig:cmLFFull}). If a sample is classified correctly as movement it is quite likely that is is also classified correctly - however with an preference on class 3 again. This matches the understanding of low frequencies as pre-movement activation mainly belonging to voluntary movement. The subjects probably plan all the possible movements while in rest to execute once the stimulus is shown. + What is interesting nevertheless is that low frequencies also occur in rest. Quite a few movements are classified as rest (see figure \ref{fig:cmLFFull}). If a sample is classified correctly as movement it is quite likely that is is also classified correctly - however with a preference on class 3 again. This matches the understanding of low frequencies as pre-movement activation mainly belonging to voluntary movement. The subjects probably plan all the possible movements while in rest to execute once the stimulus is shown. \section{Velocities and Positions} \label{dis:velPos} - I expected better performance when predicting velocities instead of absolute positions. The findings however show the opposite. The performance is quite a lot better when predicting positions directly.\\ + I had expected better performance when predicting velocities instead of absolute positions. The findings, however, show the opposite. The performance is quite a lot better when predicting positions directly.\\ This might mean that both EMG and EEG data carry more information about the actual configuration and not only about the change. - My findings do not match them of \cite{Ofner12} who found similar results for position and movement. However the task used by \citeauthor{Ofner12} was different from ours as they used self-motivated movement. + My findings do not match those of \cite{Ofner12} who found similar results for position and movement. However, the task used by \citeauthor{Ofner12} was different from ours as they used self-motivated movement. \section{Pause} \label{dis:pause} The use of a 1 second pause before movement onset only shows significant differences for classification of EMG and predicting Synergies from low frequencies. \subsection{Classification from EMG} - When doing classification from EMG data there is an great improvement when leaving the whole second before movement onset out. This results from the way the classification was done, since the beginning of movement is defined based on the EMG data.\\ + When doing classification from EMG data there is a great improvement when removing the whole second before movement onset out. This results from the way the classification was done, where the beginning of movement is determined based on the EMG data.\\ What this finding shows is that the threshold is chosen well to classify the movements. \subsection{Synergies from Low Frequencies} - There is a significant improvement when taking pre-movement data in for predicting Synergies from low frequency data. This shows again what \cite{Lew14} proposed; in low frequency features we find mainly pre-movement activation. Activity while moving is probably mostly occluded by motor commands. + There is a significant improvement when taking pre-movement data in for predicting Synergies from low frequency data. This supports what \cite{Lew14} proposed; in low frequency features we find mainly pre-movement activation. Activity while moving is probably mostly occluded by motor commands. \section{Offset} \label{dis:offset} - Applying an offset when using EEG-data or not does not make a significant difference. This is probably due to the configuration with EEG windows as large as 1 second. If smaller windows were used an offset could help, in my setup there is no difference. + Whether an offset is applied or not does not make a significant difference. This is probably due to the configuration with EEG windows as large as 1 second. If smaller windows were used, an offset could help, in my setup there is no difference. \section{Synergies} \label{dis:synergies} \subsection{Number of Synergies} \label{dis:noSyn} - As shown in section~\ref{res:noSyn} 2 and 4 Synergies are good values for Autoencoder since the slope of the mean prediction is steeper before than after. Another neuron doesn't improve the result as much as the last.\\ + As shown in section~\ref{res:noSyn} 2 and 4 Synergies are good values for Autoencoder since the slope of the mean prediction is steeper before than after. An additional synergy does not improve the result as much as the on before did.\\ For PCA and NNMF this value is reached at 3 as figure \ref{fig:noSyn} shows. The findings in the evaluation of the performance of different numbers of synergies show that 2 synergies are quite few but nevertheless have some predictive power. 4 synergies are no great improvement compared to 3 synergies.\\ This means doing the analyses with 3 synergies should give a representative picture of the performance of synergies. \subsection{Autoencoder, PCA or NMF} - In many applications the synergies computed with different methods perform similar, however some differences can be found. + In many applications the synergies computed with different methods perform similarly, however some differences can be found. \subsubsection{Prediction from EEG} PCA data is predicted from EEG significantly worse than e.g. autoencoder data ($p<0.001$). Between NMF and autoencoder there is no significant difference.\\ So autoencoder and NMF are to prefer when looking for good predictability from EEG. @@ -61,18 +61,18 @@ \subsection{Prediction via Synergies} Of course the prediction via Synergies is a bit worse than direct prediction, since the machine learning techniques could do the same dimensionality reduction and also much more.\\ This decrease however is not large which suggests that synergies are a valid step in between.\\ - In addition the prediction of synergies from EEG are significantly ($p<0.05$) better than the prediction of EMG. So the representation as synergies probably matches the representation in the brain better. This could mean that the controlling of a prostheses should be done via synergies - representing the representation in the brain and being easier to implement than a prosthesis listening to (e.g.) 32 EEG channels. + In addition, the prediction of synergies from EEG are significantly ($p<0.05$) better than the prediction of EMG. So the modeling as synergies probably matches the representation in the brain better. This could mean that the controlling of a prostheses should be done via synergies - representing the representation in the brain and being easier to implement than a prosthesis listening to (e.g.) 32 EEG channels. \subsection{Comparison with EMG} - The results show that the dimensionality reduction from 6 dimensional EMG to 3 dimensional Synergies (here via autoencoder) does not cost much information when predicting velocities and positions.\\ + The results show that the dimensionality reduction from 6 dimensional EMG to 3 dimensional synergies (here via autoencoder) does not cost much information when predicting velocities and positions.\\ For velocities there is no significant difference and even for positions the mean only differs about $0.03$ (EMG: $0.23$, Autoencoder: $0.20$).\\ - For the use of Synergies this is a great sign: Most of the information being present in the muscle activity can be condensed to few synergies. This strongly supports the idea of synergies.\\ - Figure \ref{fig:predictEMGSyn} shows that Synergies can be predicted better from EEG than EMG. Part of this effect may be explained by lower dimensionality however this is not the only reason since PCA is predicted similarly well as EMG. Another explanation is that Synergies represent an intermediate step between EEG and EMG. They are lacking some of the instability and noise of EMG and at the same time are more focused than the EEG signal. + For the use of synergies this is a great sign: Most of the information being present in the muscle activity can be condensed to few synergies. This strongly supports the idea of synergies.\\ + Figure \ref{fig:predictEMGSyn} shows that synergies can be predicted better from EEG than EMG. Part of this effect may be explained by lower dimensionality however this is not the only reason since PCA is predicted similarly well as EMG. Another explanation is that synergies represent an intermediate step between EEG and EMG. They are lacking some of the instability and noise of EMG and at the same time are more focused than the EEG signal. \section{Topographical information} \label{dis:topo} - In the beta channel (see figure \ref{fig:topoBeta}) we see high activity in the right hemisphere. This is probably an artifact of muscle movements since the commands to drive the right arm should be produced in the left hemisphere.\\ - However - as we see in prediction from EMG - the muscle activity is not very predictive for the direction of movement. EEG is even better than EMG meaning there has to be more information the decision is based on than movement artifacts.\\ + In the beta channel (see figure \ref{fig:topoBeta}) we see high activity in the right hemisphere. This is probably an artifact of muscle movements since the commands driving the right arm should be produced in the left hemisphere.\\ + However - as we see in prediction from EMG - the muscle activity is not very predictive for the direction of movement. EEG is even better than EMG meaning there must be more information the decision is based on than movement artifacts.\\ This information for example can be found in the alpha band (see figure \ref{fig:topoAlpha}). Here we see clear activation in the left hemisphere and no impact of movement artifacts since the right hemisphere shows no prominent differences in activation.\\ - What is interesting in the alpha band is that main activation is measured in the occipital lobe usually associated with visual processing. Since the cue was presented auditory the findings support the idea of the dorsal pathway. This pathway is often called \qq{Where Path} or sometimes \qq{How Path} of visual processing opposing to the ventral \qq{What Path} (cf. \cite{Ungerleider82}). The dorsal pathway is said to be involved in reaching tasks. This is supported by the findings. + What is interesting in the alpha band is that main activation is measured in the occipital lobe, which is usually associated with visual processing. Since the cue was presented auditorially the findings support the idea of the dorsal pathway. This pathway is often called \qq{Where Path} or sometimes \qq{How Path} of visual processing opposing to the ventral \qq{What Path} (cf. \cite{Ungerleider82}). The dorsal pathway is said to be involved in reaching tasks. This is supported by the findings. - When comparing reaches to different targets there are also differences in other brain regions. For example when comparing classes 2 and 4 we find differences differences in anterior regions of the parietal lobe (see figure \ref{fig:topoAlpha24}).\\ - Here the difference in activation is found in the expected area: the premotoric regions. When predicting movements a focus should be laid on this region, here different movements can be discriminated. The main difference between movement and rest are found in the occipital lobe, for a BCI also this region should be monitored. + When comparing reaches to different targets there are also differences in other brain regions. For example when comparing classes 2 and 4 we find differences in anterior regions of the parietal lobe (see figure \ref{fig:topoAlpha24}).\\ + Here the difference in activation is found in the expected area: the premotoric regions. When predicting movements, a focus should be laid on this region, as here different movements can be discriminated. The main differences between movement and rest are found in the occipital lobe, so for a BCI this region should also be monitored. diff --git a/text/thesis/05Future.tex b/text/thesis/05Future.tex index b506d68..079b54e 100644 --- a/text/thesis/05Future.tex +++ b/text/thesis/05Future.tex @@ -1,27 +1,27 @@ \section{Future Work} \label{chp:fut} \subsection{Classification} - My results in the topic of classification are not very reliable since I did the classification based on EMG (cf. section \ref{mm:newClass}). It would be interesting to analyze data where the stimulus is matched to the EEG signal and check for early detectability (e.g. with low frequencies as \cite{Lew14}).\\ + My results in the topic of classification are not very reliable since I did the classification based on EMG (cf. section \ref{mm:newClass}). It would be interesting to analyze data where the stimulus is matched to the EEG signal and check for early detectability (e.g. with low frequencies as in \cite{Lew14}).\\ Additionally classification - which is enough for some tasks - could be compared to regression. If there is only a limited set of movements a robotic prosthesis has to perform, it could use classification. This should give a lower error rate since the different movements can be distinguished better. \subsection{Measurement of error} - For comparison of regression and classification it could be interesting to introduce another measure for performance than just classified correctly or not. It could be interesting how much the predicted movement differs from the real even in the classification task. In that way one would get a measure to decide whether using classification instead of regression pays off.\\ + For comparison of regression and classification it could be interesting to introduce a different measure for performance but just classified correctly or not. It could be interesting to see how much the predicted movement differs from the real event in the classification task. In that way one would get a measure to decide whether using classification instead of regression pays off.\\ For this analysis also a variable number of classes would be interesting since having 4 movements (as in this setting) is not enough to use an artificial arm. \subsection{Offset} There is no significant effect of an offset in my configuration. When using smaller EEG windows however there might be one. This could be tried in further analyses with small EEG windows.\\ These small windows however will probably bring other problems as e.g. unstable transformation into Fourier space. So if it is necessary to use large windows, an offset is unnecessary. \subsection{Use of EEG channels} - To achieve higher performance it would be interesting to identify those EEG channels that contribute most in an good estimation of arm movements or position. There should be channels that do not carry much information for those differentiations, this however has to be explored better.\\ + To achieve higher performance it would be interesting to identify those EEG channels that contribute most to a good estimation of arm movements or position. There should be channels that do not carry much information for those differentiations, this however has to be explored further.\\ In this context research could also be done to find out which frequencies allow for the best predictions. My findings predict a better performance for the alpha band and occipital and parietal regions. A more detailed work on this specific topic however is necessary to decide based on more data. \subsection{Self-chosen movement} - For a better use of low frequency features my work could be re-done with data recorded when subjects move voluntarily. This might also influence the way synergies are predicted and could lead to an better prediction.\\ - Additionally this task matches the requirements for an BCI better, as movement in daily life is more voluntary than decided by a single auditory cue. + For a better use of low frequency features my work could be re-done with data recorded when subjects move voluntarily. This might also influence the way synergies are predicted and could lead to a better prediction.\\ + Additionally this task matches the requirements for a BCI better, as movement in daily life is more voluntary than decided by a single auditory cue. \subsection{Synergies} \subsubsection{Generation of Synergies} - This thesis shows the plausibility of synergies so the next step could be to improve the acquisition. Generating them from EMG may include unnecessary information. The generation of synergies as an intermediate step between EEG (or generally brain activity) and EMG (or generally muscle activity) may achieve even better results.\\ - A dimensionality reduction in EEG only probably will not work since there is to much unrelated activity, EMG only bears the problem of lower fit to the movement as is shown above.\\ + This thesis shows the plausibility of synergies so the next step could be to improve the acquisition. Generating them from EMG could include unnecessary information. The generation of synergies as an intermediate step between EEG (or generally brain activity) and EMG (or generally muscle activity) could achieve even better results.\\ + A dimensionality reduction in EEG only probably will not work since there is too much unrelated activity, EMG only bears the problem of lower fit to the movement as is shown above.\\ An idea could be to try a dimensionality reduction on EEG of parts of the brain known to be involved in arm movement. This however is a far less general approach than the methods I used.\\ A more general approach would be a neural network trained to predict EMG from EEG. The hidden layer of this network again could be used as synergies. \subsubsection{Autoencoders} - I did not find significantly better performance of autoencoders even with only 2 synergies. Since this was not the focus of the work here that might however be possible. Additional research is needed to answer which method is best to generate synergies. + I did not find significantly better performance of autoencoders even with only 2 synergies. Since this was not the focus of the work, it might however be possible. Additional research is needed to answer which method is best to generate synergies. diff --git a/text/thesis/Bfunctions.tex b/text/thesis/Bfunctions.tex index 678c06b..0ec910d 100644 --- a/text/thesis/Bfunctions.tex +++ b/text/thesis/Bfunctions.tex @@ -1,6 +1,6 @@ \chapter{Documentation of the Code} \label{app:docu} -The documentation of the Code will be split into parts according to the usage. in this parts the order will be alphabetically in the name of the function. +The documentation of the code will be split into parts according to the use. In this parts the order will be alphabetically in the name of the function. \section{\texttt{callAll.m}} \texttt{callAll.m} and \texttt{callAllPos.m} are the central point in the corresponding calculations. From this script every other function is called and the parameters are defined here.\\ @@ -21,26 +21,26 @@ Here the reclassification as described in section~\ref{mm:newClass} is done. For each data point it is decided whether it belongs to movement or not according to the given threshold in EMG activity. If not there is no movement so the class is set to 0 (rest).\\ - If there is movement it is decided whether there should be according to the given classification. If not the old class is applied also for this point. If yes it is checked whether the movement just started (up to now class was 0). If the movement just started, 1 second before is taken out (pause \true) or half second before is classified same, 0.5s to 1s is dropped (pause \false). + If there is movement, it is decided whether there should be according to the given classification. If not, the old class is applied also for this point. If yes, it is checked whether the movement has just started (up to now class was 0). If the movement has just started, 1 second before is taken out (pause \true) or half a second before is classified as the same class, 0.5s to 1s is dropped (pause \false). \subsection{\texttt{generateTrainingData.m}} \label{code:generate} In this function the transformation of EEG and EMG signals is done as PSD with Burg's method for EEG and waveform length for EMG. Additionally velocities are computed from kinematic data. \subsection{\texttt{generateTrainingDataPos.m}} - Same as \ref{code:generate} but kinematic data is used as is as positions. + Same as \ref{code:generate} but kinematic data is used as positions. \subsection{\texttt{myDownsample.m}} \label{code:myDown} \texttt{myDownsample} takes the given number of equidistant samples to represent the input. \subsection{\texttt{namesAndNumbers.m}} - \texttt{namesAndNubers} returns names and numbers of subjects and runs according to the file given (created by \ref{code:run.bash}) + \texttt{namesAndNumbers} returns names and numbers of subjects and runs according to the file given (created by \ref{code:run.bash}) \subsection{\texttt{readAll.m}} \label{code:readAll} This is the central function for the acquisition of data. - First the name of the generated file is composed out of the given parameters. In this way the acquisition step only has to be done once.\\ - If the file is not existing yet, it is created in the following steps:\\ + First, the name of the generated file is composed out of the given parameters. In this way the acquisition step only has to be done once.\\ + If the file does not exist yet, it is created in the following steps:\\ Data from BCI2000 is read along the corresponding kinematic information. Then this data is transformed in the form we want to use it (cf. \texttt{generateTrainingData} \ref{code:generate}). The data from each of the five runs (cf. section~\ref{mm:design}) is aggregated in one variable per modality. - As a next step the classification is done using \texttt{classifyAccordingToEMG.m} (\ref{code:classify}). The result is then smoothed and adjusted to the length of EEG data. + Next, the classification is done using \texttt{classifyAccordingToEMG.m} (\ref{code:classify}). The result is then smoothed and adjusted to the length of EEG data. Finally the kinematics and the synergies are generated matching the size of EMG data. All is then saved under the given path as \texttt{.mat} file. \subsection{\texttt{readAllPos.m}} @@ -65,7 +65,7 @@ Calculates the waveform length for the whole EMG signal calling \texttt{waveformLength.m} on defined windows \section{Data Analysis} \subsection{\texttt{correlation2.m}} - Calculates the squared correlation between each corresponding columns of two matrices.\\ + Calculates the squared correlation between the corresponding columns of two matrices.\\ This is used as a measure for fit when comparing the number of synergies. \subsection{\texttt{kFoldCV.m}} \label{code:kfoldCV} diff --git a/text/thesis/thesis.tex b/text/thesis/thesis.tex index 91fb1f0..ea58129 100644 --- a/text/thesis/thesis.tex +++ b/text/thesis/thesis.tex @@ -114,8 +114,8 @@ \section*{Abstract} \addcontentsline{toc}{section}{Abstract} -Synergies are patterns of muscle activation where muscles are used in a coordinated way and not each muscle has to be activated separately. Theory is that these patterns can be found in the brain and its activation.\\ -This thesis shows the plausibility of synergies as an intermediate step between brain and muscles. The results show only small decrease in predicting performance for position and velocity compared to the Electromyography (EMG) signal. This was achieved with synergies acquired through dimensionality reduction from EMG signal.\\ +Synergies are patterns of muscle activation where muscles are used in a coordinated way, so that not each muscle has to be activated separately. These patterns can be found in the brain and its activation.\\ +This thesis shows the plausibility of synergies as an intermediate step between the brain and the muscles. The results show only a small decrease in predicting performance for position and velocity compared to the Electromyography (EMG) signal. This was achieved with synergies acquired through dimensionality reduction from EMG signal.\\ The results of prediction of, via and from synergies are compared with other techniques currently used to predict movement from Electroencephalography (EEG) in a classification and regression context. Over all synergies perform not much worse than EMG and are predicted better from EEG.\\ Also comparison of different methods for the acquisition of synergies is done. The findings show that autoencoders are a great possibility to generate synergies from EMG. Synergies from non-Negative Matrix Factorization also perform well, those acquired by Principal Component Analysis are performing worse when being predicted from EEG.