Reproduce, Optimize, Expand: Revisiting a multi-session behavioral biometric dataset for user identification in VR

This project is already assigned.

Introduction

Especially with consumer-grade hardware, Virtual Reality (VR) gained more and more relevance in recent years. As a result, content verity and use cases for VR are increasing as well. Many of those could benefit from or even require knowledge of the users’ identity (for example e-commerce, therapy, rehabilitation, social VR, and gaming). Especially considering VR setups as workspaces shared by multiple users, the importance of user identification becomes apparent.

Identification processes in general can be distinguished in three different categories. Most common is knowledge based identification like passwords, pins or patterns. The second category uses biometrical features of a user, such as fingerprints or iris details. A third category relies on behavioral biometrics. Here, unique patterns within the specific interaction of a user are leveraged for identification. The first category is the simplest in terms of implementation, but has some drawbacks, especially for VR applications. Entering a password or pin in VR requires some sort of text input method suited for VR. Even though many different approaches have been investigated¹ ² ³ ⁴, the speed and comfort of typing in a desktop like environment are still unmatched by VR methods. The main drawback of pattern based processes (e.g. drawing a pattern via conecting dots in a 3x3 grid) rises security concerns, since the user in VR is not aware of the surrounding real world and potential bystanders⁵. The use of biometrical features is almost non intrusive and fast, but requires additional hardware like fingerprint readers, which are not typically included in VR setups. Behavioral Biometrics share the benefits of static biometrical approaches, but do not require additional hardware. The principle behind behavioral biometrics is that the details in behavior while pursuing a specific task are unique to individuals. In terms of virtual reality, such details are movement patterns in 3D space.

Previous work in this field has already shown that the 3D-spatial information about a users’ interaction naturally recorded by any VR setup is sufficient for user identification in many scenarios (see table 1).

Table 1: Overview of related work. SR, BR and BRV indicate feature computations (scene-relative, body-relative, body-relative velocity), N indicates the number of individual participants.
Origin: Schell et al. (2021)⁶

authors	classifier	task	data selection	dataset	device
Rogers et al. (2015)	random forest	ident.: view a series of rapidly changing numbers and letters	acceleration, orientation of head + eye blinking	N=20; unpublished	Google Glass
Li et al. (2016)	distance-based	auth.: listening and nodding to music	acceleration of head	N=95; unpublished	Google Glass
Mustafa et al. (2018)	logistic regression, SVM	auth.: walking to checkpoints in VR scene	acceleration, orientation of head	N=23; unpublished	Google VR
Kupin et al. (2019)	nearest neighbor, distance-based	auth.: throwing a ball	SR of right controller	N=14; unpublished	HTC Vive
Pfeuffer et al. (2019)	random forest, SVM	ident.: point, grab, walk, type	SR, orientation, linear velocity, angular velocity of head and controllers	N=22; unpublished	HTC Vive
Shen et al. (2019)	distance-based	auth.: walking a few steps	acceleration, orientation of head	N=20; unpublished	Google Glass
Ajit et al. (2019)	nearest neighbor, distance-based	auth.: throwing a ball	SR of head and controllers	N=33; unpublished	HTC Vive
Mathis et al. (2020)	fully convolutional network	auth.: interaction with a 3D cube	SR of controllers	N=23; unpublished	HTC Vive
M. Miller et al. (2020)	random forest	ident.: watching 360° videos and answering questionnaire	SR of controllers and head	N=511; unpublished	HTC Vive
R. Miller et al. (2020)	distance-based	auth.: ball throwing	SR, BRV, angular velocity of controllers and head, trigger position of controllers	N=41; unpublished	HTC Vive & Vive Cosmos, Oculus Quest
Olade et al. (2020)	nearest neighbor	auth. and ident.: grab, rotate, drop balls and cubes	SR of head and controllers + eye gaze	N=25; unpublished*	HTC Vive
Liebers et al. (2021)	LSTM, MLP	auth.: bowling, archery	BR of HMD and controllers	N=16; published	Oculus Quest
Schell et al. (2021)	random forest, MLP, FRNN, LSTM, GRU	ident.: Conversation	SR, BR, BRV of head and hands	N=34; published	3-point tracking from full body mocap

However, following the discussion of Schell et al.⁶ about previous work, there are several shortcomings which should be addressed: Firstly (1), most results so far were not validated against data recorded on a separated session. Although discussed by some authors⁶ ⁷ ⁸ ⁹, the impact of different pre-processing approaches on multi-session scenarios needs further investigation. Second (2), only Schell et al.⁶ employed proper hyperparameter optimization for their machine learning classifier. Yet this process is key for unveiling the true potential of any machine learning approach. Third (3), only Schell et al.⁶ and Liebers et al.⁸ used publicly available datasets. Therefore, none of the other previous work can be validated through reproduction work. This is especially problematic, since machine learning based solutions are very sensitive for flaws in terms of development and methodology.

To address those shortcomings, I will focus on the work of Liebers et al.⁸ and Schell et al.⁶ With the dataset of Liebers et al. being publicly available, I can investigate the reliability of the results, especially considering the multi-session nature of their dataset. Then, with the enhanced machine learning methodology described by Schell et al., I can investigate the validity and optimization potential of the results of Liebers et al. In a third stage, I can use the original dataset to explore and compare against different pre-processing approaches, suggested by related work.

Liebers et al. (2021)

Liebers et al.⁸ investigated the impact of users’ physiology on behavioral biometric user identification systems. Since my work will be based upon the dataset created by them, I will now briefly describe it. Two simple VR games, archery and bowling, were developed (scenarios). Shooting of a single arrow, or rolling the ball once, were considered a single repetition for the corresponding scenario. As the authors investigated in effects of physiology, they created 4 conditions by manipulating the virtual avatars of the users:

HeightNormalization (HN), in which the height of the avatar was artificial set the be same for every participant,
ArmLengthNormalization (AN), in which the acceleration of the virtual hands was normalization across all participants, effectively granting each participant the same arm span,
BothNormalizations (BN), a combination of 1. and 2.
WithoutNormalization (WN), in which no avatar manipulation took place.

For both scenarios, each of the 16 participants did 12 repetitions in each condition. The same recording was then repeated on a second day. The data consists of the positional and rotational data of the hand held controllers and the head mounted display, sampled at 72hz during the individual repetitions. Along with this motion data, timestamps and a discrete phase variable (e.g. (a) pick up the arrow, (b) tension the bow, (c) aim, (d) shoot) were added to every sample. For training purposes, 4 combinations of motion data, timestamp and phase were separated as individual feature sets (see Table 3).

Research Questions

How reliable are the results of Liebers et al.⁸?
a) To which extent can the results of Liebers et al.⁸ be improved via adequate hyperparameter optimization?
b) Dose the HeightNormalization condition stay superior to the WithoutNormalization condition in terms of identification accuracy, if hyperparameter optimization was employed for both?
How do velocity and acceleration approaches compare on the dataset of Liebers et al.⁸?

Methodology

All research questions will be perused individually.

Question 1 - Replication of Liebers et at (2021)

In order to gain insight into the reliability of the original results, a replication attempt will be made. Originally, Liebers et al. investigated in both multi-layer perceptron (MLP) and recurrent neural networks (RNN) architectures. For my replication, I will focus on RNN, as they were found to perform significantly better in the original work⁸. This behavior is expected to hold true for sequential data in general¹⁰, and was found for an similar task by Schell et al.⁶.

Table 2 shows the accuracies for the various combinations of condition and scenario achieved by Liebers et al. The individual cells in the blue area summarize the configurations targeted by Research Question 1 (MLP columns are not considered).

Question 2 - Optimization on Liebers et al. (2021)

Liebers et al⁸ do not report on hyperparameter optimization. Since this process usually reveals the true potential of machine learning, I will search for optimized parameters. This will lead to a new set of trained models. To perform this optimization, the available data must be split into three data sets. All recordings from the second session will be kept as a test dataset, reserved for comparisons of the final results. The data from the first session will be split into a training and a validation dataset for training and optimization purposes.

To keep the number of unique combinations within the scope of this work, I will not optimize the parameters for each condition introduced by Liebers et al. Instead, I will focus on the WN condition. However, the specific trials Archery+F3+HeightNormalization and Bowling+F2+HeightNormalization will also be considered. According to the authors⁸, they achieved the best accuracies for the archery and bowling tasks respectively. By optimizing for these conditions, a comparison can be made with WN trials. The individual results of Liebers et al. are summarized in Table 2 (highlighted in orange).

The hyperparameter optimization process is divided into two stages following the work of Schell et al⁶. The first stage deals with the parameters of the architecture of each model, while the second stage searches for data-related parameters. The individual parameters and search spaces are listed in Table 4. In the first stage, all data parameters are frozen in the state they were used by Liebers et al., while architectural parameters are searched for each scenario (archery/bowling) and feature set (see Table 3) combination of the WN data, as well as for the two HN configurations mentioned above. In the second phase, the best architectural parameters found for each combination are used, and data parameters for each combination are searched.

Table 4: Overview of hyperparameters and serach spaces
Architectural		Data
Parameter	Search space	Parameter	Search space
base model	LSTM GRU	window size	10 - 180 frames
num layers	1 - 5	sampling rate	75, 50, 25 hz
hidden size	20 - 150
dropout	0 - 0.7

Question 3 - Comparing positional data against velocities and acceleration

For the final Research Question, I will derive velocity and acceleration data based on the WN motion data. The 4 feature sets will stay the same, except for the positions and angles, which will be exchanged with the velocities in one condition and with acceleration data in an other. In doing so, I follow suggestions made by other work so far¹¹ ⁹ ⁷ ⁶. The same optimization procedure as mentioned in Question 2 will be employed here.

This will produce 16 new classifiers, which are illustrated by the green additions in table 2. In a final comparison, all of the optimized results will then be compared and discussed.

Literature

Speicher, M., Feit, A. M., Ziegler, P., Krüger, A., & Krüger, K. (n.d.). Selection-based Text Entry in Virtual Reality. https://doi.org/10.1145/3173574.3174221

↩
Knierim, P., Kosch, T., Groschopp, J., & Schmidt, A. (2020). Opportunities and challenges of text input in portable virtual reality. Conference on Human Factors in Computing Systems - Proceedings. https://doi.org/10.1145/3334480.3382920

↩
Knierim, P., Schwind, V., Feit, A. M., Nieuwenhuizen, F., & Henze, N. (2018, April 21). Physical Keyboards in Virtual Reality: Analysis of Typing Performance and Effects of Avatar Hands. Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3173574.3173919

↩
Grubert, J., Witzani, L., Ofek, E., Pahud, M., Kranz, M., & Kristensson, P. O. (2018). Text Entry in Immersive Head-Mounted Display-based Virtual Reality using Standard Keyboards. IEEE Virtual Reality (VR).

↩
Yu, Z., Liang, H.-N., Fleming, C., & Man, K. L. (2016, October). An exploration of usable authentication mechanisms for virtual reality systems. 2016 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS). https://doi.org/10.1109/APCCAS.2016.7804002

↩
Schell, C., Latoschik, M. E. & Hotho, A. (2021, submitted for review). User and Avatar Identification for XR by Deep Learning of Arbitrary Motion Data Sequences.

↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹
Liebers, J., & Schneegass, S. (2020). Gaze-based Authentication in Virtual Reality. Eye Tracking Research and Applications Symposium (ETRA). https://doi.org/10.1145/3379157.3391421

↩ ↩²
Liebers, J., Abdelaziz, M., & Mecke, L. (2021, May 6). Understanding user identification in virtual reality through behavioral biometrics and the efect of body normalization. Conference on Human Factors in Computing Systems - Proceedings. https://doi.org/10.1145/3411764.3445528

↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰
Miller, M. R., Herrera, F., Jun, H., Landay, J. A., & Bailenson, J. N. (2020). Personal identifiability of user tracking data during observation of 360-degree VR video. Scientific Reports 2020 10:1, 10(1), 1–10. https://doi.org/10.1038/s41598-020-74486-y

↩ ↩²
Madsen, A. (2019). Visualizing memorization in RNNs. Distill, 4(3), e16.

↩
Mustafa, T., Matovu, R., Serwadda, A., & Muirhead, N. (2018). Unsure how to authenticate on your VR headset? Come on, use your head! IWSPA 2018 - Proceedings of the 4th ACM International Workshop on Security and Privacy Analytics, Co-Located with CODASPY 2018, 2018-January. https://doi.org/10.1145/3180445.3180450

↩

Contact Persons at the University Würzburg

Christian Schell (Primary Contact Person)
Universität Würzburg
christian.schell@uni-wuerzburg.de