# New Intensive course

Discover the new intensive course - An Intuitive Approach to Machine Learning: Boosting, Nearest Neighbors, Random Forests and Support Vector Machines. Find out more details below.

# Workshops "intensive courses" opened for juniors.

The registration for the intensive courses is available during the registration process.

### An Intuitive Approach to Machine Learning: Boosting, Nearest Neighbors, Random Forests and Support Vector Machines

Organized by Prof. Dr. Andreas Ziegler, CEO of StatSol, Professor and head of the Institute of Medical Biometry and Statistics at the University of Lübeck, Germany

**BIOGRAPHICAL SKETCH**

Andreas Ziegler holds a Ph.D. in Statistics and is board certified trial biostatistician and epidemiologist. Since 2004 he is CEO of StatSol, a consulting company. Since 2001 Andreas is professor and head of the Institute of Medical Biometry and Statistics at the University of Lübeck, Germany. He is honorary professor for Biostatistics at the University of KwaZulu-Natal, South Africa. He was president of the German Region of the International Biometric Society and the International Genetic Epidemiology Society and the International Genetic Epidemiology Society. He is member of the Executive Board of the International Biometric Society.

Recent publications on machine learning include:

Dankowski & Ziegler 2016 Stat Med 35:3949-60

Kruppa et al. 2014 Biom J 56:534-563

Wright & Ziegler 2017 J Stat Softw, 77:1-17

Ziegler & König 2014 WIRES Data Min Knowl Discov 4:55-63

**COURSE DESCRIPTION**

Big data from high throughput genetic studies or image analyses allow the extraction of new knowledge. Machine learning summarizes the methods for automated knowledge extraction using computers. Recently, statistical properties, such as consistency or asymptotic normality of the estimators have been derived for some learning machines, which provides a better understanding of the machines. Furthermore, several machines have been extended to operate beyond the standard classification problem for dichotomous endpoints. This course provides an introduction to some of the most important machine learning approaches currently used. The focus is the non-technical but intuitive explanation of the algorithms. The use of the machines will be illustrated with R code examples. The simple descriptions in a language familiar to biostatisticians together with an illustration how they can be used in standard statistical software should help to demystify machine learning.

The aims of the course are to:

- introduce the key concepts in statistical learning and corresponding R software,
- discuss the obstacles for the application of statistical learning in practice and
- present and illustrate some extensions that are the subject of ongoing

At the end of the course, participants should:

- know the fundamental ideas of
- boosting (Adaboost, gradient boosting, likelihood boosting),
- classification, probability estimation, regression and survival trees,
- nearest neighbors (k nearest neighbors, bagged nearest neighbors),
- random forests with extensions and
- support vector machines,

- know the basic algorithms of the machine learning approaches and their tuning parameters,
- know the most important statistical properties of the machine learning approaches.

be able to perform basic machine learning analyses using these machines for various endpoints using the free software package R.

**TARGET AUDIENCE**

Academic researchers, university lecturers and graduate students working on/in

- the field of epidemiology,
- applications in clinical medicine,
- applications in genomic data,
- the development of statistical methodology,
- the development of algorithms.

**DURATION**

3/4 day (Three 90 min sessions)

**OUTLINE**

Session 1 (01:00 pm – 02:30 pm):

The first session is devoted to the simpler machine learning approaches for the classification problem, also termed class prediction, with two classes. Specifically, the following topics will be considered in the theoretical session 1:

The following topics will be considered:

- k-nearest neighbors (kNN)
- Bootstrapping (with replacement, without replacement, down-sampling, up-sampling)
- Bootstrap averaged (bagged) nearest neighbors (bNN)
- Classification trees (all components of the algorithm)
- Boosting stumps
- Random forests (RF)
- Adaboost
- Gradient boosting (GB)

Session 2 (02:45 pm – 04:15 pm):

After the first coffee break the theoretical part will be continued for another approximately 45 min.

The following topics will be considered:

- Evaluation of the performance of machines, how to compare machines
- Statistical properties of the machines (convergence rate, consistency, asymptotic normality)

In the second half of the session, the use of the following machines will be illustrated:

- kNN
- bNN
- RF
- GB

Session 3 (04:30 pm – 06:00 pm):

The final sessions will be devoted to support vector machines (SVM) and other types of endpoints. It will be up to course participants whether more focus will be put on conceptual aspects or on computer illustrations.

Specifically, the following topics might be considered in this session:

- Support vector machines for the classification problem
- Probability estimation for dichotomous endpoints
- Survival analysis
- Continuous endpoints

The following machines might be illustrated in the computer part of the session:

- Classification for dichotomous endpoint: SVM
- Probability estimation: kNN, bNN, RF, GB, SVM (code illustration of code)
- Multicategory classification and probability estimation: kNN, bNN, RF, GB, SVM (code illustration)
- Survival: bNN, RF, SVM (code illustration for SVM)

Continuous endpoints: kNN, bNN, RF, GB, SVM

**TRAINER**

- Prof. Dr. Andreas Ziegler, CEO of StatSol, Professor and head of the Institute of Medical Biometry and Statistics at the University of Lübeck, Germany

### Investigating outbreaks : an introduction to the detective work of epidemiologists

Organized by :

**OVERVIEW**

On February 12th 2018, the surveillance officer of the Public Health Department in Québec (Canada), noticed that the number of reported legionella cases was higher than usual for the week. Tuesday April 25th 2017, the emergency team of the Public Health Institute of Liberia received a call from the district of S. reporting that 8 deaths and 13 cases of an unknown disease occurred during the day and all patients had participated to a funeral ceremony. In both situations, you suspect that something is going wrong. How do you describe the event and the risk for the population? Is the number of cases in either of these situations higher than usual? What should be used to estimate the "usual pattern”?. If it is higher than usual, should the health department staff consider the situation a cluster, an outbreak? What information do you need and how to get it? How do you conduct such an investigation? How do you use the results of your investigation to provide recommendations to control the outbreak? These and related questions will be addressed in this course.

**OBJECTIVES - LEARNING OUTCOMES**

- On completion of this course you should be able to:
- Explain the current and coming up threats that epidemics represent for our societies
- Define cluster and outbreak
- List the reasons that health agencies investigate outbreaks
- Describe the steps of an outbreak investigation

Identify available training resources in field investigation and main references to consult on this topic.

**COURSE DESCRIPTION**

Based on recent investigations from different settings of various countries, either from domestic or international context, this course gives an overview of the basic steps to conduct field investigations. It includes short presentations and small group discussions and exercises.

**TARGETED AUDIENCE**

The course is designed for health professionals from the public or private sectors who are interested by disease surveillance or investigation, as well as students and researchers in epidemiology or in related fields.

**PRE-REQUISITE**

There are no pre-requisite courses or training for this course. However, a basic knowledge of the practices of public health and biostatistics is recommended.

**DURATION**

The duration of the course is 3 ½ hours.

**TRAINERS**

- Louise Alain, MSc. Epidemiologist, Regional Public Health Center, Québec (Canada) and EPITER member
- Anne Perrocheau, MD, MScPH. Medical Epidemiologist. Emergency Department, World Health Organisation, Geneva, Swiss. EPITER member.