Engineering and Architecture
Computer Science and Mathematics of Security
Synergies between machine learning and privacy
Modern societies produce and consume an enormous amount and variety of data on a daily basis. Even though the availability of such data is crucial in most areas of research, very often they contain personal information that cannot be shared with third parties or released publicly without adequate protection to ensure the fundamental right to privacy of the individuals they refer to. Classic data protection algorithms are only meant for structured data and/or require a significant amount of human intervention, which can hardly cope with the amount of data to be protected. Therefore, there is a need for fully automatic data protection supporting a variety of data types and formats.
On the other hand, modern machine learning has shifted from centralized architectures, in which a single trusted server compiles all the data and trains machine learning models, to decentralized settings, such as federated learning or fully decentralized learning, in which the learning effort is distributed among peers operating in a (typically open and untrusted) network. Even though these decentralized architectures alleviate the learning bottleneck at the server side, they are also more prone to suffer security and privacy attacks, due to the untrustworthiness of the participants and of the network itself. Due to the recentness of these decentralized machine learning architectures, the state-of-the-art in privacy and security-enhancing methods for these architectures is still in its infancy, and can barely cope with the variety and complexity of privacy and security attacks that have emerged.
The goal of this thesis is to find two-way synergies between machine learning and privacy-enhancing technologies. On the one hand, this implies leveraging state-of-the-art machine learning algorithms, such as transformers, embedding models and generative adversarial networks, to automate data protection and, in particular, the detection and masking of sensitive pieces of information in semi-structured or unstructured sources, or the generation of privacy-preserving synthetic data that faithfully represent the distribution of the original data. On the other hand, privacy-enhancing methods can be designed or adapted to decentralized machine learning scenarios, so that they are able to detect and filter out malicious peers, and to prevent or disable the privacy and security attacks those peers may orchestrate. The ultimate goal is to reach a virtuous cycle by which privacy-enhancing methods bring strong privacy and security to the decentralized learning of models that, in turn, can be used to automatically protect sensitive personal information so that it can be freely released for research or in response to open data initiatives.
Highly desirable attributes of the ideal candidate
* Demonstrated previous experience in one or more of the following topics: Privacy, machine learning
* Hold a Master degree, or equivalent, in Computer Science, Mathematics or similar subjects
* Being proficient in English language (written and oral) is required.
* Knowledge of programming languages as as Python or Java is required.
* Knowledge of machine learning frameworks as Scikit-learn, TensorFlow, PyTorch or Apache MXNet is highly recommended
* Interpersonal skills and the ability to work in a multidisciplinary team are recommended.
Ethics: This project doesn’t involve ethical aspects
Workplace Location: Campus Sescelades, Tarragona
37.5 hours a week
14 February 2022
|This project has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 945413|