Are my hands enough? Using inverse kinematics to improve user identification

This project is already completed.

Motivation

Virtual Reality (VR) in social context is becoming more and more present and by now even has its own name : Social VR. If Facebook’s plans for the future are to trust, a huge amount of human interaction could move over to VR. Not only in private context, but also at work. As VR grows in importance, so does VR’s responsibility to be a trustful, tamper-free platform. Am I really talking to my colleague, or is just someone wearing his headset? Researchers have published many solutions to verify user identity, like 3D-pattern-passwords ¹ or throwing a ball ². Yet, I prefer Continuous recognition, where users are constantly recognized during any task. That prevents users from re-authenticating themselves via password and loose immersion. I plan to analyse users’ arbitrary movements, while they are interacting with other people in a work-based context. That enables us to constantly identify users during any movement in an social based context. Since Continuous recognition approaches still lack accuracy in comparison to physical biometrics³ I aim to improve the former in this work.

There has been considerable amount of research about movement-based recognition of people. Yet, all evidence has its own use case, its own recognition approach and its own data set (see table 1 from Schell et al. , 2021 ⁴ found in section Resources). Hence, recent findings are hard to compare and more research has to be done.
This work aims to extend our knowledge in machine-learning based recognition of users in VR by means of their movement. In context of user recognition, two common terms appear: identification and authentication. Both try to recognize users, but authentication aims to minimize false-positive recognition while identification treats false-positives and false-negatives equally. (Definition by Rogers et al.⁵)
Conventional authentication tasks like PINs seem to work in VR ⁶. Nonetheless, researchers also investigate potentially less intrusive methods, like biometric authentication. That is what this work focuses on. In general, two types of authentication biometrics exist. Physical biometrics rely on physiological features of a person, like fingerprint or shape of the face. Behavioral biometrics however focus on a person’s behavior, like movement patterns. ⁷
While performing specified motoric tasks can be suitable for authentication ²¹, one can assume that continuous evaluation of one’s arbitrary movement is less intrusive ³. As Miller et al.⁸ and Schell⁴ have shown, users can be recognized while doing everyday tasks. That helps not to break immersion while being in VR, which is desired in most cases. I want to do further research on the methodology of Schell et al. ⁴. That is, I aim to further improve recognition based on arbitrary movements in a social work-based context.
All of the publications from table 1 use single-session data, including the work of Schell et al. ⁴. That is problematic for machine-learning based recognition, since networks might learn session-dependent movement patterns. Multi-sessional data could prevent that. Therefore, I collect data from a VR-course with 14 participants over 14 weeks, taking place for 1.5 hours.

Goals

This work pursues the following goal: I aim to extend the results found by Schell et al.⁴. Users’ movements will be analyzed in order to create personalized movement profiles. I plan to use the methodology of Schell et al.⁴ as a baseline in comparison to our new approach. That approach is to use inverse kinematics (IK) to estimate new human joint information. This information is hoped to improve the capability of our deep user recognition network. Since IK estimations rely on body information about the user, I will compare effects of IK on two conditions:
a) averaged human parameters on body height, arm length, shoulder distance
b) measured parameters of each user on body height, arm length, shoulder distance.

This results in two research questions:

Can the results of Schell et al.⁴ on user identification in VR be replicated in a different setting, namely VR-Teaching.
Can the results of Schell et al. ⁴ be improved by estimations of additional human joints by means of inverse kinematics.
1. with averaged human parameters
2. with personalized users’ parameters based on measurements

Methods and schedule

Methods

Schell et al. ⁴ compared different methodologies on user recognition. I benefit from his results and plan to implement what produced the best outcomings in his research.

First, I plan to use Gated Recurrent Units (GRU) for our machine learning process. Our data will be movement data, that is sequential data. Machine learning architectures that seem to treat sequential data best are Recurrent Neural Networks (RNN). One variant are GRU. That architecture has proven to be suited for user recognition based on behavior ⁴.
Second, our plan is to enrich our data set by using inverse kinematics. I will use applications, that apply IK on our data to estimate new coordinates for the following three missing joints: elbow, shoulders, neck. By this, I hope to
1. improve training of the network
2. reduce observation time for a precise prediction.
Third, I plan to use velocity data. That is, I neglect absolute positions and rotations of the users in the scene. Rather, I feed the net differences in position and rotation between two frames, in dependence of time. This approach prevents the net to learn scene related information ⁴ and rather help it focus on movement data only.

Schedule

I plan to pursue the following steps in a chronolocial manner :

Step 1: Collecting Data from 14 persons, that attend a course Praxisorientierte Planung und Gestaltung von Schulunterricht mit Virtual Reality, where attendants get taught how to teach in VR. The seminar takes place once a week over a total of 14 weeks and lasts 1.5 hours.
Step 3: Generation of three additional features by means of inverse kinematics: neck, shoulders, elbows
Step 4: Verifying the estimated joints on generic human skeletons in game engine Unity
Step 5: Implementation of the velocity approach. Instead of absolute joint data I will use velocity data. That is realised by calculating the difference between two frames, in dependency of time.
Step 6: Three data sets will be compared on their results with Gated Recurrent Units.
- The raw dataset without IK estimations is used as a baseline to evaluate the following two datasets.
- A second data set, similar to the first but with estimated data of the three additional joints by means of inverse kinematics. Averaged human skeleton information will be used.
- A third data set that further investigates the three joint estimation. I will examine if the neural network will benefit from more precise inverse kinematics data. By tracking body information about the users, I expect the IK estimation to be more precise and thus improve neural network accuracy.
Step 8: Hyperparameter search
Step 9: Evaluation of results
Step 10: Writing the thesis

Resources

Table 1: Recent publications on machine-learning based identification or authentication

Literature

Z. Yu, H. Liang, C. Fleming and K. L. Man, "An exploration of usable authentication mechanisms for virtual reality systems," 2016 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS) , 2016, pp. 458-460, doi: https://10.1109/APCCAS.2016.7804002.

↩ ↩²
Kupin A., Moeller B., Jiang Y., Banerjee N.K., Banerjee S. (2019) Task-Driven Biometric Authentication of Users in Virtual Reality (VR) Environments. In: Kompatsiaris I., Huet B., Mezaris V., Gurrin C., Cheng WH., Vrochidis S. (eds) MultiMedia Modeling. MMM 2019. Lecture Notes in Computer Science, vol 11295. Springer, Cham. https://doi.org/10.1007/978-3-030-05710-7_5

↩ ↩²
Tahrima Mustafa, Richard Matovu, Abdul Serwadda, and Nicholas Muirhead. 2018. Unsure How to Authenticate on Your VR Headset? Come on, Use Your Head! In Proceedings of the Fourth ACM International Workshop on Security and Privacy Analytics (IWSPA '18). Association for Computing Machinery, New York, NY, USA, 23–30. DOI:https://doi.org/10.1145/3180445.3180450

↩ ↩²
Christian Schell, Andreas Hotho, Marc E. Latoschik 2021. User and Avatar Identification for XR by Deep Learning of Arbitrary Motion Data Sequences. Unpublished Manuscript.

↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹
C. E. Rogers, A. W. Witt, A. D. Solomon, and K. K. Venkatasubramanian, “An approach for user identification for head-mounted displays,” ISWC 2015 - Proceedings of the 2015 ACM International Symposium on Wearable Computers, pp. 143–146, 2015.

↩
George, Ceenu & Khamis, Mohamed & Zezschwitz, Emanuel & Burger, Marinus & Schmidt, Henri & Alt, Florian & Hussmann, Heinrich. (2017). Seamless and Secure VR: Adapting and Evaluating Established Authentication Systems for Virtual Reality. 10.14722/usec.2017.23028.

↩
https://www.revelock.com/en/blog/physical-biometrics-vs-behavioral-biometrics

↩
Miller, M.R., Herrera, F., Jun, H. et al. Personal identifiability of user tracking data during observation of 360-degree VR video. Sci Rep 10, 17404 (2020). https://doi.org/10.1038/s41598-020-74486-y

↩

Contact Persons at the University Würzburg

Christian Schell (Primary Contact Person)
Universität Würzburg
christian.schell@uni-wuerzburg.de