PhD position in Computer Science with focus on link prediction, big graph processing and Bioinformatics at Inria Nancy (France)
Country/Region : France
Website :
Description
Title: Distributed link prediction in large complex graphs: application to biomolecule interactions
Location: Inria Nancy - Grand Est research centre – LORIA, Nancy, France
Closing date: Applications will be evaluated starting from May 5, 2019.
Project-team: CAPSID
Context and positioning:
Today, vast and diverse sources of data exist for almost every scientific domain, making their integration and intelligent exploitation challenging. Indeed, complex data require expressive data representation models such as graph representation. The Linked Open Data (LOD) movement along with the FAIR (Findability, Accessibility, Interoperability, Reusability) data principles are intended to facilitate heterogeneous data integration and analyses. In the LOD context, graphs are called knowledge graphs as they encompass domain ontologies for typing objects and describing their relationships. Semantic web languages (RDFS, OWL, SPARQL) have reached an interesting level of maturity on which ambitious machine learning techniques can rely. Interestingly, big data and NoSQL solutions make possible web-scale data analyses. So far, such analyses on dedicated big-data architectures are often limited to MapReduce scenarios on rather simple data models (key-value oriented, homogeneous graphs with only one type of nodes and one type of edges). Graph databases, as one NoSQL approach, allow for rich representation of multi-typed attributed nodes and edges. This better expressivity comes with a cost as graph and program distribution is not an easy task.
The objective of this PhD project is to make progress to the state of the art of link prediction problem in knowledge graphs in a distributed setting [1][2][3] in the context of predicting drug-target associations. Predicting new targets for known drug molecules (often called drug repositioning [5]) offers the possibility of using existing drug molecules in new ways, which is far cheaper and much less time-consuming than developing new drug molecules from scratch. However, the computational challenge here is to make link prediction algorithms capable to take into account the existence of attributes and labels on both nodes and edges in knowledge graphs. The proposed approaches will be evaluated using web-scale knowledge graphs for inferring missing links (data completion). YAGO, DBpedia, and synthetic benchmarks are usable for such evaluation and validation purposes [4].
Missions:
This PhD thesis project aims to develop scalable link prediction methods in large and complex graphs. More specifically, the aims of this thesis project are:
- to propose link prediction approaches in knowledge graphs based on both graph topology and neighborhood constraints to be defined;
- to design scalable implementations of the proposed approaches for distributed architectures. In this context, the use of big graph processing frameworks such as Pregel, Trinity, GraphLab and BLADYG need to be studied [6];
- to define evaluation and validation protocols for the proposed algorithms in the context of web-scale knowledge graphs;
- to apply the approach to the prediction of the drug-target associations.
This project will be carried out mainly within the Capsid team at INRIA Nancy which combines expertise in knowledge graphs, distributed graph computing [6] and drug-target interactions (https://capsid.loria.fr). Achieving the objectives of the thesis will involve acquiring knowledge and understanding of the current state of the art in link prediction in large and complex graphs. An important aspect of this project will be to explore the use of big graph processing frameworks in order to design scalable implementations of proposed link prediction methods in knowledge graphs. The proposed techniques will be implemented on a local cluster and evaluated using publicly available data.
This project will develop novel and practical link prediction algorithms that will be applied to predict drug targets. This will help to satisfy an important and current research need in drug repositioning. The developed software will be made publicly available.
Thesis tasks:
Study existing link prediction algorithms in homogeneous graphs
Extend the approach to knowledge graphs (by considering node and edge properties)
Evaluate on well-known public RDF datasets
Apply to biological knowledge graphs describing drug-target interactions
Required qualification:
Candidates must have a master degree in computer science, mathematics, or one of the physical sciences. Good programming skills in an object-oriented programming language such as JAVA or C++ are essential. Experience of NoSQL solutions (Neo4j, Titan, MongoDB), parallel/distributed programming (Spark, Hadoop, Flink) and graph processing frameworks (Pregel, GraphLab, GraphX) is also desirable but not essential. A strong interest in structural biology would also be highly desirable.
Advantages
- Duration: 3 years
- Starting date: between Oct. 1st 2019 and Jan. 1st 2020- Salary : 1 982 euros gross monthly (about 1 593 euros net) during the first and the second years. 2 085 euros the last year (about 1 676 euros net). Medical insurance is included.
Help and benefits:
Possibility of free French courses
Help for accommodations
Help for the resident card procedure and for husband/wife visa
Lunch cost at Inria canteen is about 3 €
References:
[1] Seyed Mehran Kazemi and David Poole. SimplE Embedding for Link Prediction in Knowledge Graphs. Advances in Neural Information Processing Systems 31 (NIPS 2018), 4284--4295, 2018.
[2] Behera, Ranjan Kumar, Abhishek Sai Shukla, Sambit Mahapatra, Santanu Kumar Rath, Bibhudatta Sahoo and Swapan Bhattacharya. Map-Reduce based Link Prediction for Large Scale Social Network. The 29th International Conference on Software Engineering and Knowledge Engineering (SEKE), 2017.
[3] Xiaoya Xu, Bo Liu, Jianshe Wu and Licheng Jiao. Link prediction in complex networks via matrix perturbation and decomposition. Scientific Reports - Nature, volume 7, Article number: 14724, 2017.
[4] Melo A., Paulheim H. (2017) Synthesizing Knowledge Graphs for Link and Type Prediction Benchmarking. In: Blomqvist E., Maynard D., Gangemi A., Hoekstra R., Hitzler P., Hartig O. (eds) The Semantic Web. ESWC 2017. Lecture Notes in Computer Science, vol 10249. Springer, Cham.
[5] Sudeep Pushpakom et al. (2019) Drug repurposing: progress, challenges and recommendations. Nature Reviews Drug Discovery volume 18, pages 41–58 (2019)
[6] S. Aridhi, E. Mephu Nguifo. Big Graph Mining: Frameworks and Techniques. Big Data Research (BDR), Elsevier, 9(C), pp. 9-17, 2017.
Supervision and contact :
Sabeur Aridhi, sabeur.aridhi-AT-loria.fr, https://members.loria.fr/SAridhi
Malika Smail-Tabbone, malika.smail-AT-loria.fr, https://members.loria.fr/MSmail
The required documents for applying are the following :
- CV
- a motivation letter
- your degree certificates and transcripts for Bachelor and Master (or the last 5 years if not applicable)
- Master thesis (or equivalent) if it is already completed, or a description of the work in progress, otherwise
- all your publications, if any (it is not expected that you have any).
- At least one recommendation letter from the person who supervises(d) your Master thesis (or research project or internship); you can also send at most two other recommendation letters.
The recommendation letter(s) should be sent directly by their author to the prospective PhD advisor.
All the documents should be sent in at most 2 pdf files; one file should contain the publications, if any, the other file should contain all the other documents. These two files should be sent to your prospective PhD advisor (in addition to the application on the web).
Location: Inria Nancy - Grand Est research centre – LORIA, Nancy, France
Closing date: Applications will be evaluated starting from May 5, 2019.
Project-team: CAPSID
Context and positioning:
Today, vast and diverse sources of data exist for almost every scientific domain, making their integration and intelligent exploitation challenging. Indeed, complex data require expressive data representation models such as graph representation. The Linked Open Data (LOD) movement along with the FAIR (Findability, Accessibility, Interoperability, Reusability) data principles are intended to facilitate heterogeneous data integration and analyses. In the LOD context, graphs are called knowledge graphs as they encompass domain ontologies for typing objects and describing their relationships. Semantic web languages (RDFS, OWL, SPARQL) have reached an interesting level of maturity on which ambitious machine learning techniques can rely. Interestingly, big data and NoSQL solutions make possible web-scale data analyses. So far, such analyses on dedicated big-data architectures are often limited to MapReduce scenarios on rather simple data models (key-value oriented, homogeneous graphs with only one type of nodes and one type of edges). Graph databases, as one NoSQL approach, allow for rich representation of multi-typed attributed nodes and edges. This better expressivity comes with a cost as graph and program distribution is not an easy task.
The objective of this PhD project is to make progress to the state of the art of link prediction problem in knowledge graphs in a distributed setting [1][2][3] in the context of predicting drug-target associations. Predicting new targets for known drug molecules (often called drug repositioning [5]) offers the possibility of using existing drug molecules in new ways, which is far cheaper and much less time-consuming than developing new drug molecules from scratch. However, the computational challenge here is to make link prediction algorithms capable to take into account the existence of attributes and labels on both nodes and edges in knowledge graphs. The proposed approaches will be evaluated using web-scale knowledge graphs for inferring missing links (data completion). YAGO, DBpedia, and synthetic benchmarks are usable for such evaluation and validation purposes [4].
Missions:
This PhD thesis project aims to develop scalable link prediction methods in large and complex graphs. More specifically, the aims of this thesis project are:
- to propose link prediction approaches in knowledge graphs based on both graph topology and neighborhood constraints to be defined;
- to design scalable implementations of the proposed approaches for distributed architectures. In this context, the use of big graph processing frameworks such as Pregel, Trinity, GraphLab and BLADYG need to be studied [6];
- to define evaluation and validation protocols for the proposed algorithms in the context of web-scale knowledge graphs;
- to apply the approach to the prediction of the drug-target associations.
This project will be carried out mainly within the Capsid team at INRIA Nancy which combines expertise in knowledge graphs, distributed graph computing [6] and drug-target interactions (https://capsid.loria.fr). Achieving the objectives of the thesis will involve acquiring knowledge and understanding of the current state of the art in link prediction in large and complex graphs. An important aspect of this project will be to explore the use of big graph processing frameworks in order to design scalable implementations of proposed link prediction methods in knowledge graphs. The proposed techniques will be implemented on a local cluster and evaluated using publicly available data.
This project will develop novel and practical link prediction algorithms that will be applied to predict drug targets. This will help to satisfy an important and current research need in drug repositioning. The developed software will be made publicly available.
Thesis tasks:
Study existing link prediction algorithms in homogeneous graphs
Extend the approach to knowledge graphs (by considering node and edge properties)
Evaluate on well-known public RDF datasets
Apply to biological knowledge graphs describing drug-target interactions
Required qualification:
Candidates must have a master degree in computer science, mathematics, or one of the physical sciences. Good programming skills in an object-oriented programming language such as JAVA or C++ are essential. Experience of NoSQL solutions (Neo4j, Titan, MongoDB), parallel/distributed programming (Spark, Hadoop, Flink) and graph processing frameworks (Pregel, GraphLab, GraphX) is also desirable but not essential. A strong interest in structural biology would also be highly desirable.
Advantages
- Duration: 3 years
- Starting date: between Oct. 1st 2019 and Jan. 1st 2020- Salary : 1 982 euros gross monthly (about 1 593 euros net) during the first and the second years. 2 085 euros the last year (about 1 676 euros net). Medical insurance is included.
Help and benefits:
Possibility of free French courses
Help for accommodations
Help for the resident card procedure and for husband/wife visa
Lunch cost at Inria canteen is about 3 €
References:
[1] Seyed Mehran Kazemi and David Poole. SimplE Embedding for Link Prediction in Knowledge Graphs. Advances in Neural Information Processing Systems 31 (NIPS 2018), 4284--4295, 2018.
[2] Behera, Ranjan Kumar, Abhishek Sai Shukla, Sambit Mahapatra, Santanu Kumar Rath, Bibhudatta Sahoo and Swapan Bhattacharya. Map-Reduce based Link Prediction for Large Scale Social Network. The 29th International Conference on Software Engineering and Knowledge Engineering (SEKE), 2017.
[3] Xiaoya Xu, Bo Liu, Jianshe Wu and Licheng Jiao. Link prediction in complex networks via matrix perturbation and decomposition. Scientific Reports - Nature, volume 7, Article number: 14724, 2017.
[4] Melo A., Paulheim H. (2017) Synthesizing Knowledge Graphs for Link and Type Prediction Benchmarking. In: Blomqvist E., Maynard D., Gangemi A., Hoekstra R., Hitzler P., Hartig O. (eds) The Semantic Web. ESWC 2017. Lecture Notes in Computer Science, vol 10249. Springer, Cham.
[5] Sudeep Pushpakom et al. (2019) Drug repurposing: progress, challenges and recommendations. Nature Reviews Drug Discovery volume 18, pages 41–58 (2019)
[6] S. Aridhi, E. Mephu Nguifo. Big Graph Mining: Frameworks and Techniques. Big Data Research (BDR), Elsevier, 9(C), pp. 9-17, 2017.
Supervision and contact :
Sabeur Aridhi, sabeur.aridhi-AT-loria.fr, https://members.loria.fr/SAridhi
Malika Smail-Tabbone, malika.smail-AT-loria.fr, https://members.loria.fr/MSmail
The required documents for applying are the following :
- CV
- a motivation letter
- your degree certificates and transcripts for Bachelor and Master (or the last 5 years if not applicable)
- Master thesis (or equivalent) if it is already completed, or a description of the work in progress, otherwise
- all your publications, if any (it is not expected that you have any).
- At least one recommendation letter from the person who supervises(d) your Master thesis (or research project or internship); you can also send at most two other recommendation letters.
The recommendation letter(s) should be sent directly by their author to the prospective PhD advisor.
All the documents should be sent in at most 2 pdf files; one file should contain the publications, if any, the other file should contain all the other documents. These two files should be sent to your prospective PhD advisor (in addition to the application on the web).
Last modified: 2019-04-24 08:08:57