Axis 5 - Modelling, probabilistic models and research methods
Avner Bar-Hen (PRCM Cnam), Yvon Pesqueux (PRCM) et Michel Béra (PRCM CNAM)
Associated researchers: Pr. Michael Spence ; Didier Le Ruyet (PR1 - CNAM), Pr. Robert Cario, Pr. Martine Herzog-Evans, Pr. Jean-Philippe Denis
This research area, led by Professors Bar-Hen, Pesqueux and Béra, focuses on the use (or misuse) of research methodologies applied to security issues. This area covers the development of new actuarial standards, the questioning of broad data models (LLMs; Avner Bar-Hen) and the impact of management methods on public order and societal organisations (Y Pesqueux).
For example, Michel Béra examines the importance of this quantification and the considerable computational requirements for large datasets (variables: ~100, individuals: ~Ms), and proposes measures that are quicker to calculate and could be used as indicators of the contribution of records to the overall extreme risk of re-identification (Ottawa conference paper).
Personal data concerning large segments of the population (whether human beings or legal entities) is proliferating in public and private administrations. It is recognised as a valuable asset for political and commercial purposes.
Unfortunately, they are also valuable for malicious purposes and are often the subject of attempts to compromise them. Legislation is very strict: custodians of personal/professional databases can be subject to severe penalties, as well as damage to their reputation, in the event of data theft. The consequences are even more serious if the identity of natural or legal persons is revealed. Data custodians are obliged to protect the identity of the persons concerned by the data, whether or not it is intended to be shared with the authorities or researchers.
Anonymization and pseudonymization are two categories of measures designed to protect the identity of data subjects. While anonymization is considered irreversible and pseudonymization reversible, the proliferation of huge quantities of data generated and collected by devices and software tools is pushing the boundary between anonymous and pseudonymous data towards the latter. What is anonymous today may not be tomorrow.
From this point of view, all data sets carry a risk of re-identification. The QaR method (AFNOR, 2020) proposes a measure of the risk of re-identification of a dataset and a statistical technique, based on the theory of extreme values, for estimating it. This risk is very valuable, as shown by the amounts of compensation awarded in class action suits relating to data theft. It can be used to assess the effectiveness of disclosure control that custodians apply to data; it could be reported to regulatory authorities to demonstrate the level of attention custodians pay to protecting the privacy of data subjects (SOLVENCY II obligations for insurers, RGPD Article 35 obligations for data holders); it can be used to calculate an insurance premium against unauthorised disclosure or the amount custodians need on their balance sheet to cover potential financial damage from such disclosure. YES!!! when you don't put radar or armored doors in your house, the insurance that is compulsory costs more... the Anglo-Saxons have understood this very well: rather than using technology, you hit the wallet (insurance premiums).
Reliable assessment of the confidence of a deep neural network and prediction of its failures are of paramount importance for the practical deployment of these models. In this paper, we propose a new target criterion for model confidence, corresponding to the true class probability (TCP).
Since the true class is inherently unknown at the time of testing, we propose to learn the TCP criterion on the training set, by introducing a specific learning scheme adapted to this context. Extensive experiments are conducted to validate the relevance of the proposed approach. We study different network architectures, small-scale and large-scale datasets for image classification and semantic segmentation. We show that our approach systematically outperforms several robust methods, from MCP to Bayesian uncertainty, as well as recent approaches specifically designed for fault prediction.
Based on the work of Yvon Pesqueux, this line of research also examines the ontological, epistemological and methodological dimensions of applied research in security, defense, intelligence and criminology.
Over and above the debate on the distinction between qualitative and quantitative approaches, this line of research addresses the following questions: Ethnography as a privileged tool for the production of socially useful local knowledge, based on the following elements: discussion of H. S. Becker's ‘Tricks of the Trade', the importance of the descriptive task, the survey, storytelling, the case study and the narrative; the case study, the longitudinal case study, the situation as a case study. It is a forum for discussion of ‘classical’ qualitative methods, techniques of qualitative methodology (direct observation techniques, interview techniques, ‘active’ data collection methods, triangulation in data collection methods, internal and external validity of qualitative research, participant observation), grounded theory. The applied research methodology section also deciphers the stages of qualitative research (pre-analysis, the data analysis or codification phase, categorization, linking and representation of results, the data verification phase); the sociology of translation and the theory of the network actor; Visual Studies; the Critical Incident Method (CIM); art-based research methods; the logbook; action research and research in the field.
Related research questions
- Deep Learning and Big Data: uses and limits for crime detection and prevention
- Distributed predictive modelling of criminal involvement (tipping points, acting out, behavioural convergences)
- Application of extreme data models to the study of criminal phenomena
- Singletons and outliers in the prediction of lone wolf phenomena
- Aberrant data and truncated data in the study of criminal phenomena