On December 7, Till Mossakowski (Otto-von-Guericke University of Magdeburg) will give a talk on
Neuro-symbolic integration for ontology-based classification of structured objects
in room Knowledge (TAB building) at 3pm.
Reference ontologies play an essential role in organising knowledge in the life sciences and other domains. They are built and maintained manually. Since this is an expensive process, many reference ontologies only cover a small fraction of their domain. We develop techniques that enable the automatic extension of the coverage of a reference ontology by extending it with entities that have not been manually added yet. The extension shall be faithful to the (often implicit) design decisions by the developers of the reference ontology. While this is a generic problem, our use case addresses the Chemical Entities of Biological Interest (ChEBI) ontology with classes of molecules, since the chemical domain is particularly suited to our approach. ChEBI provides annotations that represent the structure of chemical entities (e.g., molecules and functional groups).
We show that classical machine learning approaches can outperform ClassyFire, a rule-based system representing the state of the art for the task of classifying new molecules, and is already being used for the extension of ChEBI. Moreover, we develop RoBERTa and Electra transformer neural networks that achieve even better performance. In addition, the axioms of the ontology can be used during the training of prediction models as a form of semantic loss function. Furthermore, we show that ontology pre-training can improve the performance of transformer networks for the task of prediction of toxicity of chemical molecules. Finally, we show that our model learns to focus attention on more meaningful chemical groups when making predictions with ontology pre-training than without, paving a path towards greater robustness and interpretability. This strategy has general applicability as a neuro-symbolic approach to embed meaningful semantics into neural networks.
Till Mossakowski is a professor of theoretical computer science at Otto-von-Guericke University of Magdeburg, Germany. He has co-designed the distributed ontology, model and specification language DOL, as well as the corresponding Heterogeneous Tool Set. His research interests are logic, semantics, and neural-symbolic integration, as well as applications in energy network simulation models.