masterarbeit/text/thesis/02MaterialsAndMethods.tex at 6c54d94fe27710ac6960afd8a67c3e58305c077b

Fork: 0
jph / masterarbeit
Find file
Newer
Older
masterarbeit / text / thesis / 02MaterialsAndMethods.tex
Jan-Peter Hohloch on 30 Oct 2016 31 KB thesis - regression, cross validation (begin), present tense
Raw Blame History
\chapter{Materials and Methods}
\label{mat}
\section{Scientific background}
\label{mat:background}
    \subsection{BCIs}
        The idea of BCIs began to spread in the 1970s when Vidal published his paper (\cite{Vidal73}).\\
        First approaches used invasive BCIs earlier in Animals (rodents and monkeys) later also in humans. Invasive BCIs in humans were mostly implanted when the human was under brain surgery for another reason like therapy of epilepsy. Problems of invasive BCIs are the need to cut through skull and dura mater. This can lead to infections and severe brain damage.\\
        An improvement were less invasive BCIs with e.g. ECoG which is placed inside the skull but outside the dura which decreased the risk for infections massively.\\
        Measuring outside the skull entails even less risk, the dura and skull however lower the quality of the signal massively. With some improvements EEG has a spatial resolution of 2-3 cm (cf. \cite{Babiloni01}). This is quite bad compared to the single neuron one can observe with invasive methods. However we are more interested in the activity of areas then single cells for our task, so EEG meets our requirements here.\\
        In addition EEG is much cheaper and easier to use than other techniques. There is no need for surgery (like for invasive methods) and the hardware can be bought for less than 100\euro{} while FMRI hardware costs far above 100,000\euro{}. This is one of the reasons EEG is far more available than other techniques. There are some inventions of younger date but not as much work has been done with them why they are not as well known and as far distributed as EEG.\\
        Another pro of EEG is that the device is head mounted. That means the user may move while measuring without high impact on the tracking of activity. This is highly necessary for any BCI used in daily life.
    \subsection{EEG}
        When using Electroencephalography (EEG) one measures the electrical fields on the scalp that are generated by activity of neurons in the brain. These measurements allow some interpretation about what is happening inside the skull. In our application we use the recorded currents directly to train a SVM or as predictor for regression.

        The foundation stone for EEG was laid in 1875 when Richard Caton found electrical activity around the brain of monkeys. After testing the methods on animals in 1924 the first human EEG was recorded by Hans Berger in Jena. He also coined the term Electroencephalography and is seen as the inventor of EEG.

        The frequencies typically used for movement prediction in EEG are about 8-24 Hz (\cite{Blokland15},\cite{Ahmadian13},\cite{Wang09}).
        EEG is often used for non-invasive BCIs because it is cheap and easier to use than e.g. fMRI. The electrodes have to be spread over the scalp. To allow for comparability there are standardized methods for this. These methods also bring a naming convention with them.
        \subsubsection{10-20 system}
            In this standard adjacent electrodes are placed either 10\% or 20\% of the total front-back or left-right distance apart. This standardization also makes it possible to name each electrode or rather here place. This is done with capital letters for lobes (Frontal, \qq{Central}, Parietal, Occipital and Temporal) and numbers for the specific place on the lobe. Even numbers are on the right side of the head, odd on the left; larger numbers are closer to the ears, lower numbers closer to the other hemisphere. The exact number now refers to the exact distance from center: $$\left\lceil\frac{x}{2}\right\rceil\cdot \frac{d}{10}$$ where $x$ is the number and $d$ the diameter of the scalp. Electrodes in the centre are named with a lower case $z$ e.g. $Cz$.\\
            Electrodes between two lobes (10\% instead of 20\% distance) are named with the both adjacent lobes (anterior first) e.g. $FCz$ (between frontal and central lobe).
            Also see figure~\ref{fig:10-20}.
        \begin{figure}[!p]
            \centering
            \includegraphics[width=\textwidth]{eeg_electrodes_10-20.png}
            \caption{Full 10-20 system}
            \label{fig:10-20}
        \end{figure}
    \subsection{Power estimation}
        \subsubsection{EEG}
            To use data from EEG one way is to analyze the occurring frequencies and their respective power.\\
            To gain these from the continuous signal there are different methods. The intuitive approach would be to use Fourier transformation however the Fourier transform does not need to exists for a continuous signal. So we used power spectral density (PSD) estimation.
        \subsubsection{Power spectral density estimation}
            The PSD is the power per frequency. Power here refers to the square of the amplitude. %TODO: formulation,fft
            If the Fourier transform is existing, PSD can be calculated from it e.g. as periodogram. If not it has to be estimated. One way to do so is parametrized with an Autoregressive model(AR). Here one assumes that the there is a correlation of the spectral density between $p$ consecutive samples and the following one. This leads to an equation with only $p$ parameters which can be estimated in different ways. We used Burg's method (\texttt{pburg} from \matlab{} library).\\
            In Figure~\ref{fig:psd} we see the difference between autoregressive \texttt{pburg} and periodogram \texttt{pwelch} PSD estimation.
            \begin{figure}
                \includegraphics[width=\textwidth]{psd.png}
                \caption{PSD with Autoregressive model and FFT respectively\protect\footnotemark}

                \label{fig:psd}
            \end{figure}
            \footnotetext{The signal was unfiltered EEG data from channel \textit{Cz} second run of second session with subject AO}
        \subsubsection{Burg's method - Autoregressive Model}
        \label{mat:burg}
            Burg's method (\cite{Burg75}) is a special case of parametric PSD estimation. It interprets the Yule-Walker-Equations as least squares problem and iteratively estimates solutions.\\
            According to \cite{Huang14} Burg's method fits well in cases with the need of high resolution.\\
            Burg and Levinson-Durbin algorithms are examples for PSD estimation where an autoregressive model is used instead of Fast Fourier Transformation. The approach is described well by Spyers-Ashby et al. (\cite{Spyers98}). The idea is to lower the number of parameters determining the production of the signal. The number of parameters used is called \textit{model order} (250 in our example, lower in most cases). These parameters are estimated from the original data. For PSD estimation the modeled values are used which allows easier transformation since the data is generated by an known process.\\
            Often the Rational transfer function modeling is used having the general form of $$x_n=-\sum\limits_{k=1}^p a_kx_{n-k}+ \sum\limits_{k=0}^q b_ku_{n-k},$$ where $x_n$ is the output, $u_n$ the input. $a,b$ are the system parameters which have to be estimated from original data. As we have unknown input in our application the output can only be estimated which simplifies the formula as follows $$\hat{x}_n=-\sum\limits_{k=1}^p a_k\hat{x}_{n-k}.$$
            Estimating the parameters is done by minimizing the forward prediction error $E$: $$E=\frac{1}{N}\sum\limits_{i=1}^N \left(x_i-\hat{x}_i\right)^2$$
            The minimum has zero slope and can be found by setting the derivative to zero:$$\frac{\partial E}{\partial a_k},\text{ for } 1\le k\le p$$
            This yields a set of equations called \emph{Yule-Walker-Equations} (cf. \cite{Yule27},\cite{Walker31}).\\
            Using forward and backward prediction the parameters ($a_k$) are estimated based on the Yule-Walker-Equations then.
    \subsection{Low Frequencies}
        In the 2000s there began a movement using new techniques to record ultrafast and infraslow brainwaves (above 50Hz and below 1Hz). These were found to have some importance (cf. \cite{Vanhatalo04}).\\
        Also in predicting movements there was found some significance in low frequency as was done by Liu et al. (\cite{Liu11}) and Antelis et al. (\cite{Antelis13}) for example. Antelis et al. found correlations between hand movement and low frequency signal of $(0.29,0.15,0.37)$ in the dimensions respectively.\\
        Lew et al. (\cite{Lew14}) state low frequencies are mainly involved in spontaneous self-induced movement and can be found before the movement starts. By this they may be a great possibility to lower reaction time of neuroprostheses for example.
        %TODO: more details (idea, possible explanantion)
    \subsection{Filtering}
        Filtering of the recorded EEG signal is necessary for different reasons. First there are current relics from 50Hz current. These can be filtered out with bandstop filters.\\
        Secondly we need to concentrate on the interesting frequencies (for classical EEG 1-50Hz). This is done by applying lowpass or highpass filters respectively. This is necessary because the PSD of lower frequency is a lot higher than that of higher frequencies. The relation $$PSD(f)=\frac{c}{f^\gamma}$$ holds for constants $c$ and $\gamma$ (\cite{Demanuele07}).\\
        The Butterworth filter (\cite{Butterworth30}) was invented by Stephen Butterworth in 1930. Its advantage was uniform sensitivity to all wanted frequencies. In comparison to other filters Butterworth's is smoother because it is flat in the pass band and monotonic over all frequencies. This however leads to decreased steepness meaning a higher portion of frequencies beyond cutoff.
    \subsection{EMG}
        When using muscles they are contracted after an signal via an efferent nerve activates them. Contraction of muscles also releases measurable energy which is used for Electromyography (EMG). There are intramuscular applications of EMG but we only used surface EMG.\\
        From surface EMG activity of muscles can be estimated however not very precisely without repetition. Since the muscles used for arm movements are quite large in our setting EMG allows relatively precise estimations of underlying muscle activity.

        EMG is mainly developed for diagnostic tasks. However it is also applicable in science to track muscle activity. %TODO?
    \subsection{Synergies}
        Movement of the arm (and other parts of the body) are under-determined meaning with given trajectory there are different muscle contractions possible. One idea how this problem could be solved by our nervous system are synergies. Proposed by Bernstein in 1967 (\cite{Bernstein67}) they describe the goal of the movement (e.g. the trajectory) instead of controlling single muscles. This would mean however that predicting the activity of single muscles from EEG is harder than predicting a synergy which in turn determines the contraction of muscles.\\
        Evidence for the use of synergies in the nervous system was found e.g. by Bizzi et al. (\cite{Bizzi08}) and Byadarhaly et al. (\cite{Byadarhaly12}). They also showed that synergies meet the necessary requirement to be able to build predictable trajectories.\\
        Synergies are usually gotten from EMG signal through a principal component analysis (PCA, cf. \ref{mat:pca}), non-negative matrix factorization (NMF, cf. \ref{mat:nmf}) or autoencoders (a form of neuronal network, cf. \ref{mat:autoenc}).
    \subsection{PCA}
    \label{mat:pca}
        Principal Component Analysis (PCA) is probably the most common technique for dimensionality reduction. The idea is to use those dimensions with the highest variance to keep as much information as possible in the lower dimensional room.\\
        Invented PCA was in 1901 by Karl Pearson (\cite{Pearson01}). The intention was to get the line closest to a set of data. This line also is the one that explains most variance.\\
        The PCA of data can be done in different ways. One is calculating the eigenvectors of the covariance matrix. The principal component is the eigenvector with the highest eigenvalue. Other eigenvectors follow ordered by their eigenvalues.\\
        \begin{figure}
            \centering
            \includegraphics[width=0.7\textwidth]{GaussianScatterPCA.jpg}
            \caption{Eigenvectors of Gaussian scatter}
            \label{fig:pca}
        \end{figure}
        In Figure~\ref{fig:pca} we see the eigenvectors of the data. The longer vector is the principal component the shorter one is orthogonal to it and explains the remaining variance. The second component here also is the component which explains least variance, since most variance is orthogonal to it.
        %TODO? formula, ...
    \subsection{NMF}
    \label{mat:nmf}
        In some applications Non-negative Matrix Factorization (NMF) is preferred over PCA (cf. \cite{Lee99}). This is because it does not learn eigenvectors but decomposes the input into parts which are all possibly used in the input. When seen as matrix factorization PCA yields matrices of arbitrary sign where one represents the eigenvectors the other the specific mixture of them. Because an entry may be negative cancellation is possible. This leads to unintuitive representation in the first matrix.\\
        NMF in contrast only allows positive entries. This leads to \qq{what is in, is in} meaning no cancellation which in turn yields more intuitive matrices. The first contains possible parts of the data, the second how strongly they are represented in the current input.\\
        The formula for NMF is
        $$Input\approx \mathbf{WH}$$
        where Input is $n\times m$, $W$ is $n\times r$ and $H$ is $r\times m$ with $r<<\min\{m,n\}$. So $\mathbf{WH}$ is only an approximation of the input however with significant lower dimension - the number of Synergies used.\\ %TODO: formulation
        The factorisation is learnt with an update rule that my be chosen. We used the \matlab{} default, an alternating least squares (ALS) algorithm. It can be described as in algorithm \ref{alg:als} (cf. \cite{Berry07}).\\
        \begin{algorithm}
            \begin{algorithmic}
                \State initialize $\mathbf{W}$ randomly
                \State $i\gets0$
                \While{$i <$ numberOfIterations}
                    \State Solve $\mathbf{W^TWH=W^TA}$ for $\mathbf{H}$
                    \State set all negative values in $H$ to zero
                    \State Solve $\mathbf{HH^TW^T=HA^T}$ for $\mathbf{W}$
                    \State set all negative values in $\mathbf{W}$ to zero
                    \State $i\gets i+1$
                \EndWhile
                \Return $\mathbf{W}$
            \end{algorithmic}
            \caption{Alternating Least Squares in NMF}
            \label{alg:als}
        \end{algorithm}
        This version uses some simplifications (as setting to zero to be non-negative) and an slightly improved form is used in \matlab{}.\\
        ALS usually converges faster and with an better result than multiplicative update algorithms which would be the alternative in \matlab{}.
    \subsection{Autoencoders}
    \label{mat:autoenc}
        Autoencoders are a specific type of artificial neural networks (ANN). They work like typical ANNs by adjusting weights between the layers however they do not predict an unknown output but they predict their own input. What is interesting now is manipulating the size of the hidden layer where the size of the hidden layer has to be smaller than the input size. Now in the hidden layer the information of the input can be found in a condensed form (e.g. synergies instead of single muscle activity).\\
        Autoencoders have been successfully used by Spüler et al. to extract synergies from EMG (\cite{Spueler16}). Especially with a lower number of synergies autoencoders should perform better than PCA or NMF because linear models fail to discover the agonist-antagonist relations that are typical for muscle movements. These however can be detected by autoencoders which allows for good estimations with half the synergies.\\
        An autoencoder's input layer has as many neurons as there are input dimensions (e.g. one for each EMG channel). The number of hidden layer neurons may be varied. We mostly used 3. The output layer is of the same size as the input layer. This autoencoder is shown in Figure~\ref{fig:autoenc}.
        \begin{figure}
            \centering
            \input{autoencoder.tikz}
            \caption{Autoencoder (6-3-6)}
            \label{fig:autoenc}
        \end{figure}
    \subsection{Support-vector Machines}
        Support-vector machines (SVMs) are used for classification of data. This is done by separating data in feature space by a hyperplane. Additional data is classified with respect to the site of the hyperplane it is located in feature space.\\
        This hyperplane is considered optimal if the margins on both sides (distance to the nearest data point) are maximal to allow for the maximal possible noise. This means the separating hyperplane can be constructed out of the nearest points (3 in 2-D) from both classes. This points however may be different for different attempts as an different angle in some dimension may make different points the nearest (cf. Figure~\ref{fig:hyperplanes}).\\
        \begin{figure}
            \centering
            \includegraphics[width=0.6\textwidth]{pictures/hyperplanes.png}
            \caption{Two possible hyperplanes while the orange one has larger margins}
            \label{fig:hyperplanes}
        \end{figure}
        The hyperplane is defined as $\vec{w}\cdot \vec{x}-b=0$ while the hyperplanes tangent to the classes are $\vec{w}\cdot \vec{x}-b=1$ or $\vec{w}\cdot \vec{x}-b=-1$ respectively (see Figure~\ref{fig:svm}). The margin is $\frac{2}{||\vec{w}||}$. The margins have to be maximized while all data is classified correctly, so the problem can be formulated as:
        $$\min \frac{2}{||\vec{w}||}\ s.t.\ y_i(\vec{w}\cdot\vec{x_i}-b)\geq 1,\ i=1,\dots,n,$$
        where $y_i$ is the class (+1 or -1) of the corresponding data.
        \begin{figure}
            \centering
            \includegraphics[width=0.8\textwidth]{pictures/svm.png}
            \caption{Margins and hyperplane (Figure by Cyc and Peter Buch)}
            \label{fig:svm}
        \end{figure}
        This prototype of a SVM is only able to separate two classes of linear separable data. For other data some improvements were necessary.
        \subsubsection{Multiclass SVM}
            If there are more than two classes to separate it can be done with SVMs in different ways. One approach is \emph{one-vs-one} meaning all classes are compared with the according SVM and the SVM votes for one or the other class. This is done for all pairs and the class with most votes is picked.\\
            Another approach is \emph{one-vs-all} where every class is compared against the remaining. Here scores are used to determine which class matches best, i.e. in which class the data is farthest from the separating hyperplane.
        \subsubsection{Soft-margin SVM}
            If data is not separable soft-margin SVMs have to be used. They allow wrong classification but try to minimize them. The problem can be formulated as
            $$\text{Minimize }\frac{1}{N}\sum\limits_{i=1}^N\max\{0,1-y_i(\vec{w}\cdot\vec{x_i}-b)\}+\lambda ||\vec{w}||^2,$$
            where $\lambda$ is the parameter that adjusts the trade-off between large margins and wrong classifications (if $\lambda$ has an higher value, there is more weight on large margins).
        \subsubsection{Kernel trick}
            Data like in Figure~\ref{fig:kernel} are not \emph{linear} separable. The idea here is to apply the \emph{kernel trick} meaning to transform the data in a different space where they are linear separable. In the example this is accomplished by using the distance from origin as feature and separating in that space.
            \begin{figure}
                \input{pictures/kernel.tikz}
                \caption{Data separable with the kernel trick}
                \label{fig:kernel}
            \end{figure}
            Common kernels are polynomial, Gaussian and hyperbolic kernels.
    \subsection{Regression}
        Regression is the idea of finding $\beta$ so that $$y= X\beta+\epsilon$$ where X is the $n\times p$ input matrix and y the $n\times 1$ output vector of a system. Having this $\beta$ from given input the output can be predicted.\\
        There are different ways to find this $\beta$. One common approach is the \emph{ordinary least squares}-Algorithm. $$\hat{\beta}=\arg\min\limits_{b\in\mathds{R}^p} \left(y-Xb\right)^T\left(y-Xb\right),$$ meaning the chosen $\hat\beta$ is that $b$ which produces the lowest error since $Xb$ should be - besides from noise $\epsilon$ - the same as $y$.\\
        Choosing $\beta$ like this brings the risk of overfitting. The training data is matched well, new data however are classified poorly. This problem is addressed with RIDGE-Regression.
        \subsubsection{RIDGE-Regression}
            Instead of minimizing only the error in RIDGE-Regression (also called \emph{Tikhonov regularization}) the size of vector $\beta$ is also minimized: $$\hat{\beta}=\arg\min\limits_{b\in\mathds{R}^p} \left(y-Xb\right)^T\left(y-Xb\right)+\lambda b^Tb.$$
            $\lambda$ decides how big the influence of size of $\beta$ or $b$ respectively is.
    \subsection{Cross Validation}
        $k$-fold cross validation means splitting the data into $k$ equally sized parts, training the model on $k-1$ parts and validating on left one.%TODO
        %May be nested (example)
\section{Experimental design}
    The data used for this work were mainly recorded by Farid Shiman, Nerea Irastorza-Landa, and Andrea Sarasola-Sanz for their work (\cite{Shiman15},\cite{Sarasola15}). We were allowed to use them for further analysis.\\
    There were 9 right-handed subjects with an average age of 25 (variance 6.67, minimum 20, maximum 28). Three female and 6 male subjects were tested. All the tasks were performed with the dominant right hand.\\
    To perform was a center-out reaching task to one of four targets (see \ref{fig:experimentalDesign}) while 32 channel EEG, at least%
    \footnote{\texttt{'AbdPolLo', 'Biceps', 'Triceps', 'FrontDelt', 'MidDelt'} and \texttt{'BackDelt'} were recorded for every subject, others only in some. Only the 6 channels tracked in every session were used} %
    6 channel surface EMG and 7 DOF kinematics were tracked.
    \begin{figure}
        \centering
        \includegraphics{experimentalDesign.jpg}
        \caption{Center-out reaching task with four color-coded targets}
        \label{fig:experimentalDesign}
    \end{figure}
    Of the kinematic information tracked we only used position ($x,y$) and angle ($\theta$, rotation around $z$-axis) of the hand.\\
    Only complete sessions were used in our analysis to ensure better comparability.\\
    One session consists of 5 runs with 40 trials each. The trials were separated by resting phases of varying length (2-3s, randomly assigned). Each trial began with an auditory cue specifying the random but equally distributed target for this trial. This leads to 50 reaches to the same target each session.
    After the cue the participants should \qq{perform the movement and return to the starting position at a comfortable pace but within 4 seconds}\footnote{\cite{Shiman15}}\\
    For each subject there were 4 to 6 sessions, each recorded on a different day. All in all there were 255 runs in 51 sessions. Each session was analyzed independently as one continuous trial.
\section{Data Acquisition}
    \subsection{Loading of data}
        The data recorded with BCI2000 (\cite{Schalk04}) can be loaded into \matlab{} with a specific \texttt{.mex} file. The according \texttt{.mex}-Files for some platforms (Windows, MAC, Linux) are available from BCI2000 precompiled.\\
        We load the signal plus the according status data and the parameters (see Algorithm~\ref{alg:load_bcidat}).
        \begin{algorithm}
            \begin{algorithmic}
                \State [signal, states, params] = load\_bcidat('dataFile.dat');
            \end{algorithmic}
            \caption{Usage of \texttt{load\_bcidat}}
            \label{alg:load_bcidat}
        \end{algorithm}
        \subsubsection{Signal}
            The signal is loaded as matrix of 41 channels (see Table~\ref{tab:channelNames}). All the values are integers %TODO: representing what?
            and also can be loaded as floating point values representing microvolts. Since the representation should not make any difference when analyzing the spectrum we use the smaller representation %TODO: check
            \begin{table}
                \centering
                \begin{tabular}{c|c|l}
                    Number & Name &\ Meaning\\\hline
                    1 & Fp1&\ cf. Figure~\ref{fig:10-20}\\
                    2&Fp2&\ cf. Figure~\ref{fig:10-20}\\
                    3&F7&\ cf. Figure~\ref{fig:10-20}\\
                    4&F3&\ cf. Figure~\ref{fig:10-20}\\
                    5&Fz&\ cf. Figure~\ref{fig:10-20}\\
                    6&F4&\ cf. Figure~\ref{fig:10-20}\\
                    7&F8&\ cf. Figure~\ref{fig:10-20}\\
                    8&FC5&\ cf. Figure~\ref{fig:10-20}\\
                    9&FC1&\ cf. Figure~\ref{fig:10-20}\\
                    10&FC2&\ cf. Figure~\ref{fig:10-20}\\
                    11&FC6&\ cf. Figure~\ref{fig:10-20}\\
                    12&T7&\ cf. Figure~\ref{fig:10-20}\\
                    13&C3&\ cf. Figure~\ref{fig:10-20}\\
                    14&Cz&\ cf. Figure~\ref{fig:10-20}\\
                    15&C4&\ cf. Figure~\ref{fig:10-20}\\
                    16&T8&\ cf. Figure~\ref{fig:10-20}\\
                    17&TP9&\ cf. Figure~\ref{fig:10-20}\\
                    18&CP5&\ cf. Figure~\ref{fig:10-20}\\
                    19&CP1&\ cf. Figure~\ref{fig:10-20}\\
                    20&CP2&\ cf. Figure~\ref{fig:10-20}\\
                    21&CP6&\ cf. Figure~\ref{fig:10-20}\\
                    22&TP10&\ cf. Figure~\ref{fig:10-20}\\
                    23&P7&\ cf. Figure~\ref{fig:10-20}\\
                    24&P3&\ cf. Figure~\ref{fig:10-20}\\
                    25&Pz&\ cf. Figure~\ref{fig:10-20}\\
                    26&P4&\ cf. Figure~\ref{fig:10-20}\\
                    27&P8&\ cf. Figure~\ref{fig:10-20}\\
                    28&PO9&\ cf. Figure~\ref{fig:10-20}\\
                    29&O1&\ cf. Figure~\ref{fig:10-20}\\
                    30&Oz&\ cf. Figure~\ref{fig:10-20}\\
                    31&O2&\ cf. Figure~\ref{fig:10-20}\\
                    32&PO10&\ cf. Figure~\ref{fig:10-20}\\
                    33&AbdPolLo&\ abductor pollicis longis\\
                    34&Biceps&\ long head of the biceps\\
                    35&Triceps&\ the external head of triceps\\
                    36&FrontDelt&\ anterior portion of deltoid\\
                    37&MidDelt&\ lateral portion of deltoid\\
                    38&BackDelt&\ posterior portion of deltoid,\\
                    &&\ teres minor and infraespinatusmuscles\\
                    39&HEOG&\ \textit{not used}\\
                    40&Synchro&\ Synchronization trigger for kinematic recording\\
                    41&Zero&\ Zero reference\\
                \end{tabular}
                \caption{Channelnumbers and -names}
                \label{tab:channelNames}
            \end{table}
        \subsubsection{States}
            The main information contained by the \texttt{states} \matlab{}\texttt{-struct} is the currently presented stimulus. The \texttt{struct} has same length as the signal so that for every entry in the signal there is corresponding state information.\\
            There were some adjustments necessary since it did not match the movements (cf. Section~\ref{sec:newClass}).
        \subsubsection{Parameters}
            All the settings from the BCI2000 recording are saved in and loaded from the \texttt{parameters}.\\
            Examples are the names of the channels, the random seed for BCI2000 and the sounds, meaning and duration for different stimuli.
    \subsection{Filtering}
        For filtering we use second order Butterworth filters.
        For normal EEG we used:
        \begin{enumerate}
            \item Highpass at \texttt{minEEGFreq} (usually 2Hz)
            \item Lowpass at \texttt{maxEEGFreq} (usually 49Hz)
            \item bandstop at 48-52Hz and 148-152Hz to eliminate (possibly relevant) power line noise
        \end{enumerate}
        For low frequencies we use:
        \begin{enumerate}
            \item Highpass at 0.01Hz
            \item Lowpass at 1Hz
        \end{enumerate}
        For EMG data we use:
        \begin{enumerate}
            \item Highpass at 2Hz to eliminate movement artifacts
            \item bandstop at 48-52Hz and 148-152Hz to eliminate (possibly relevant) power line noise
        \end{enumerate}
        For filtering we use \matlab{}s \texttt{filtfilt} function to reduce shift due to multiple filtering steps.
    \subsection{Windowing}
        To process continuous EEG or EMG data it is necessary to define windows. These windows are shifted over the data to get discrete values for the further steps.\\
        We choose as default for EEG
        \begin{itemize}
            \item window $1 s$
            \item shift $200ms$
        \end{itemize}
        and for EMG
        \begin{itemize}
            \item window $0.2s$
            \item shift $50ms$.
        \end{itemize}
        For re-synchronization we pass the shift to every function that needs to align EMG and EEG data and operate on \emph{time-from-start} there.
    \subsection{Waveform length}
        For the representation of EMG data we used waveform length as described by Phinyomark et al. in \cite{Phinyomark12} which gives a measurement for the change in the window. It is calculated as the sum over the absolute difference of consecutive measurements in the time window:
        $$\sum\limits_{i=1}^N \left| x_i-x_{i-1}\right|,$$
        where $N$ is the number of measurements in the time window and $x_i$ are the measurements themselves.\\
        It describes the length of the way of a needle plotting the EMG on a self-moving band.
    \subsection{Classification}
    \label{sec:newClass}
        Very bad results when classifying EMG into Move/Rest made us further inspect the data. The actual movement came quite a while after the stimulus.\\
        To address this problem we did a re-classification of the data according to actual movements (cf. Appendix~\ref{code:classify}). To decide whether the subject is moving or not we compare the mean EMG activity (from Waveform Length) to a threshold (10,000 by default).\\
        If there is movement we define the class occurring most in the second before as the current task. If there is movement but the stimulus tells to rest, we assign the last active stimulus.\\
        In addition we take the second before movement onset out of the data (classified as -1) and (optionally) half a second before movement onset as belonging to the following stimulus.\\
        Finally we do some smoothening by taking the most occurring class one second before to one second after the current time step as its class.\\
        As last step we adjust the length of the stimulus-vector to the length of the EEG data.\\
        According to this classification we take only data in the further analysis which are classified different than -1 meaning they are either clear rest or clear movement.
    \subsection{Synergies}
%TODO
\section{Data Analysis}
    \subsection{Classification}
        Classification can be done in different ways. First approach was discriminating Movement from Rest. This was done by training an SVM and testing its results with 10-fold cross validation. This was done with EMG, EEG and LF data. EMG in this setting is trivial since it was the basis for the classification (cf. \ref{sec:newClass}).\\
        In a second step we tried to discriminate movement in different directions also with an SVM trained on EMG, EEG or LF data respectively. The fit of the model was also checked with 10-fold cross validation.