diff --git a/text/thesis/02MaterialsAndMethods.tex b/text/thesis/02MaterialsAndMethods.tex index 8fb48ea..aad9bd2 100644 --- a/text/thesis/02MaterialsAndMethods.tex +++ b/text/thesis/02MaterialsAndMethods.tex @@ -159,10 +159,40 @@ Choosing $\beta$ like this brings the risk of overfitting. The training data is matched well, new data however are classified poorly. This problem is addressed with RIDGE-Regression. \subsubsection{RIDGE-Regression} Instead of minimizing only the error in RIDGE-Regression (also called \emph{Tikhonov regularization}) the size of vector $\beta$ is also minimized: $$\hat{\beta}=\arg\min\limits_{b\in\mathds{R}^p} \left(y-Xb\right)^T\left(y-Xb\right)+\lambda b^Tb.$$ - $\lambda$ decides how big the influence of size of $\beta$ or $b$ respectively is. + $\lambda$ decides how big the influence of the size of $\beta$ or $b$ respectively is. \subsection{Cross Validation} - $k$-fold cross validation means splitting the data into $k$ equally sized parts, training the model on $k-1$ parts and validating on left one.%TODO - %May be nested (example) + $k$-fold cross validation means splitting the data into $k$ equally sized parts, training the model on $k-1$ parts and validating on left one (see Figure~\ref{fig:crossValidation}). + \begin{figure} + \includegraphics[width=\textwidth]{pictures/K-fold_cross_validation_EN.jpg} + \caption{k-fold cross validation (picture by Joan.domenech91 and Fabian Flöck)} + \label{fig:crossValidation} + \end{figure} + This is done to achieve a measure for how good the fit is. When using cross validation all the data is used for prediction and is predicted. This eliminates effects of randomness.\\ + The reason for e.g. 10-fold cross validation is that a lot of data ($9/10$) is left for the prediction. If there is enough data one may want to use 2-fold cross validation only to lower computation times. + + Cross validation can also be nested. This is necessary for example if it is used for parameter optimization and to measure fitness or if there is more than one parameter. When nesting computation time grows exponentially in the number of nested cross validation steps (see Algorithm~\ref{alg:cv}) + \begin{algorithm} + \begin{algorithmic} + \State data $\gets$ data$\{0$...$9\}$ + \For{$i\gets 0,...,9$} + \State remainingData$\{0,...,9\}\gets$ data$\{0,...,9\}\setminus\{i\}$ + \For{$k\gets0,...,n-1$}\Comment $n$ is the number of parameter values tested + \For{$j\gets0,...,9$} + \State remainingData'$\gets$ remainingData$\{0,...,9\}\setminus\{j\}$ + \State train with remainingData' and parameter $k$ + \State validate on remainingData$j$ $\rightarrow$ fit$j$ + \EndFor + \State fit for parameter $k$ is mean of fit$j$ + \EndFor + \State choose best parameter $k$ + \State train with remainingData$\{0,...,9\}$ and best parameter + \State validate on data$\{i\}$ $\rightarrow$ fit$i$ + \EndFor + \State all-in-all fit is mean of fit$i$ + \end{algorithmic} + \caption{Nested 10-fold Cross Validation with parameter optimization} + \label{alg:cv} + \end{algorithm} \section{Experimental design} The data used for this work were mainly recorded by Farid Shiman, Nerea Irastorza-Landa, and Andrea Sarasola-Sanz for their work (\cite{Shiman15},\cite{Sarasola15}). We were allowed to use them for further analysis.\\ There were 9 right-handed subjects with an average age of 25 (variance 6.67, minimum 20, maximum 28). Three female and 6 male subjects were tested. All the tasks were performed with the dominant right hand.\\ diff --git a/text/thesis/pictures/K-fold_cross_validation_EN.jpg b/text/thesis/pictures/K-fold_cross_validation_EN.jpg new file mode 100644 index 0000000..f8418c7 --- /dev/null +++ b/text/thesis/pictures/K-fold_cross_validation_EN.jpg Binary files differ