Machine Learning and Artificial Neural Networks

Research groups

Recurrent neural networks

The human brain is a recurrent neural network (RNN): a network of neurons with feedback connections. It can learn many behaviors / sequence processing tasks / algorithms / programs that are not learnable by traditional machine learning methods. This explains the rapidly growing interest in artificial RNNs for technical applications: general computers which can learn algorithms to map input sequences to output sequences, with or without a teacher.

RNNs are computationally more powerful and biologically more plausible than other adaptive approaches such as Hidden Markov Models (no continuous internal states), feedforward networks and Support Vector Machines (no internal states at all). Our recent applications include adaptive robotics and control, handwriting recognition, speech recognition, keyword spotting, music composition, attentive vision, protein analysis, stock market prediction, and many other sequence problems.

Early RNNs of the 1990s could not learn to look far back into the past. Their problems were first rigorously analyzed on Schmidhuber's RNN long time lag project by his former PhD student Hochreiter (1991). A feedback network called "Long Short-Term Memory" (LSTM, Neural Comp., 1997) overcomes the fundamental problems of traditional RNNs, and efficiently learns to solve many previously unlearnable tasks involving:

Recognition of temporally extended patterns in noisy input sequences
Recognition of the temporal order of widely separated events in noisy input streams
Extraction of information conveyed by the temporal distance between events
Stable generation of precisely timed rhythms, smooth and non-smooth periodic trajectories
Robust storage of high-precision real numbers across extended time intervals.

LSTM has transformed machine learning and Artificial Intelligence (AI), and is now available to billions of users through the world's four most valuable public companies: Apple (#1 as of March 31, 2017), Google (Alphabet, #2), Microsoft (#3), and Amazon (#4).

Deep Learning since 1991: First Very Deep Learners to Win Official Contests in Pattern Recognition, Object Detection, Image Segmentation, Sequence Learning, Through Fast & Deep / Recurrent Neural Networks.

Deep Learning in Artificial Neural Networks (NNs) is about credit assignment across many (not just a few) subsequent computational stages or layers, in deep or recurrent NNs.

The first Deep Learning systems of the feedforward multilayer perceptron type were created half a century ago (Ivakhnenko et al., 1965, 1967, 1968, 1971). The 1971 paper already described an adaptive deep network with 8 layers of neurons.
Recently the field has experienced a resurgence. Since 2009, our Deep Learning team has won 9 (nine) first prizes in important and highly competitive international pattern recognition competitions (with secret test set known only to the organisers), far more than any other team. Our neural nets also were the first Very Deep Learners to win such contests (e.g., on classification, object detection, segmentation), and the first machine learning methods to reach superhuman performance in such a contest.

List of the won competitions:

MICCAI 2013 Grand Challenge on Mitosis Detection
ICPR 2012 Contest on Mitosis Detection in Breast Cancer Histological Images
ISBI 2012 Brain Image Segmentation Challenge (with superhuman pixel error rate)
6. IJCNN 2011 Traffic Sign Recognition Competition (only our method achieved superhuman results)
ICDAR 2011 offline Chinese Handwriting Competition
Online German Traffic Sign Recognition Contest
ICDAR 2009 Arabic Connected Handwriting Competition
ICDAR 2009 Handwritten Farsi/Arabic Character Recognition Competition
ICDAR 2009 French Connected Handwriting Competition.

Caption test

Probabilistic forecasting

Leader: Giorgio Corani

A time series is a sequence of observations of the same variable, collected over time. Forecasting is the problem of predicting how the time series will evolve in the future. We estimate both the most probable development of the time series and its uncertainty. On this topic, three PhD students of our group have been awarded at the 2025 International. Symposium on Forecasting.

We have also expertise in hierarchical forecasting, i.e., forecasting time series characterized by aggregation constraints. For instance, the sum of the forecasts of energy demand of the different regions of a country should equal the forecast of energy demand for the entire country. We developed algorithms for hierarchies containing both smooth and intermittent time series.

We are currently running an SNF project on forecasting and hierarchical forecasting; our PhD students won three awards by presenting their research on these topics. We release our algorithms in open-source packages; we do both methodological research and collaborations with companies.

As an example of the collaboration with companies, we developed anomaly detection algorithms for an industrial process whose sensor produced a stream of data.
We also developed a time series classification solution (i.e., assign a label to a time series.) to predict the type of crop (maze, oat, rice etc.) starting from a temporal sequence of satellite images.

S. Damato, D. Azzimonti, G.Corani, Forecasting intermittent time series with Gaussian Processes and Tweedie likelihood, Int. J. of Forecasting, 2025.
L. Zambon, D. Azzimonti, N. Rubattu, G. Corani, Probabilistic reconciliation of mixed-type hierarchical time series, Proc. UAI 2024 (The 40th Conference on Uncertainty in Artificial Intelligence)
L. Zambon, D. Azzimonti, G. Corani, Efficient probabilistic reconciliation of forecasts for real-valued and count time series, Statistics and Computing, (2024, 34:21).
G. Corani, D. Azzimonti, N. Rubattu, Probabilistic reconciliation of count time series, Int. Journal of Forecasting, 40(2), 457-469, 2024.
G. Corani, A. Benavoli, M. Zaffalon (2021). Time series forecasting with Gaussian Processes needs priors. Proc. European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 103–117.

Caption test

Bayesian Networks (BNs)

Bayesian Networks are foundational model in machine learning and artificial intelligence. They provide a qualitative view of the structure of complex data sets, and, at the same time, they build on their probabilistic foundations to deliver rigorous and interpretable models. Bayesian networks are quickly becoming a tool of choice in a wide range of application fields from clinical practice to epidemiology, genetics and environmental sciences.

Our work at IDSIA aims to adapt and extend Bayesian network to handle the heterogeneous, complex data that characterise application at the frontier of research. These include, for instance, handling big data in a computationally efficient way; taking into account the time and space dimensions; using incomplete data in effective ways; and combining sets of data collected under different experimental conditions. The focus is on providing production-ready software implementations of both industry-standard approaches and our own original methods for use in applications. Life and physical sciences require the complete interpretability that characterise Bayesian network over other machine learning models, and problems at the forefront of research in those disciplines routinely produce data like those described above.

M. Scutari e J.-B. Denis (2021). Bayesian Networks with Examples in R. Chapman & Hall, 2ª edizione.
M. Scutari (2020). Bayesian Network Models for Incomplete and Dynamic Data. Statistica Neerlandica, 74(3), 397–419.
M. Scutari, C. E. Graafland e J. M. Gutiérrez (2019). Who Learns Better Bayesian Network Structures: Accuracy and Speed of Structure Learning Algorithms. International Journal of Approximate Reasoning, 115, 235–253.
M. Scutari, C. Vitolo e A. Tucker (2019). Learning Bayesian Networks from Big Data with Greedy Search: Computational Complexity and Efficient Implementation. Statistics and Computing, 29(5), 1095–1108.

Caption test

Causal analysis and Knowledge Engineering

Traditional machine learning is based on statistical models trying to capture correlations in the training data. The goal is to eventually achieve accurate predictions on previously unobserved data. Yet, to understand the causal relations between the model variables, dedicated mathematical tools are required and we develop them in this research topic.

Judea Pearl’s structural causal models are among the most prominent examples of the mathematical tools that can help us in unraveling the complex causal relationships across data. These models, based on Bayesian networks (see Section 5.5.3) allow to answer more complex queries, like the effect of interventions on some variables and counterfactuals.

In some recent papers, IDSIA researchers identified an equivalence relation between causal models and credal networks, a generalised class of Bayesian networks, on which IDSIA has a long-standing experience. Causal analysis through credal network equivalence appears as a promising research direction worth of investigation in the next years. Vice versa, some new recent algorithms developed for causal queries could be used for credal networks. IDSIA is traditionally using credal networks in a applied projects to model expert knowledge (knowledge engineering), and support or explain the corresponding decisions. It seems possible to develop new approximate techniques for these models and apply them to such problems. Notably the above pieces of theoretical research have been always supported by free software libraries developed by the IDSIA team and implementing the new algorithms. The same is expected to happen for the above considered future work.

Zaffalon, M., Antonucci, A., & Cabañas, R. (2020). Structural causal models are (solvable by) credal networks. In International Conference on Probabilistic Graphical Models (pp. 581-592). PMLR.
Cabañas, R., Antonucci, A., Huber, D., & Zaffalon, M. (2020). CREDICI: A Java Library for Causal Inference by Credal Networks. In International Conference on Probabilistic Graphical Models (pp. 597-600). PMLR.
Zaffalon, M., Antonucci, A., & Cabañas, R. (2020). Causal Expectation Maximisation. arXiv preprint arXiv:2011.02912.

Caption test

Graph and geometric deep learning

Graph and geometric deep learning are machine learning fields that combine graph representations for data and machine learning to exploit the inductive bias associated with the presence of functional dependencies among data. In this area IDSIA is currently exploring two main research directions: learning high order graph structures from data and investigate graph state-space dynamical systems.

Learning high-order graph structures from data

The investigation line aims at advancing research in representation learning techniques to encode relational information of any order and, at the same time, retrieve that relational structure from data. Hereby, we identify three main research tasks:

Graph learning. The activity aims at developing a scalable methodology to address the problem of making inferences from multivariate data by exploiting relational inductive biases. There are many possible target domains: from physical/virtual sensor networks to knowledge graphs and point clouds.
Statistical assessment of (hyper)graph estimators. The research requires to investigate suitable statistical tools and develop hypothesis tests to assess the significance of the learned graph as well as study conditions under which learnability is granted.
Quasi-invertible graph embeddings. Processing a latent space is useful, but often implies losing the explicit relational information. The investigation aims at exploiting techniques and theoretical results developed in the graph learning framework to design embedding methodologies tailored to solve this decoding problem.

Graph state-space neural models

The research aims at building theories, methodologies and tools for (hyper)graph-based predictive models that extend traditional state-space representations.
The main research challenges are:

Enable comprehensive modelling for dynamical graph systems through graph representations for inputs, states and -possibly- outputs.
Design advanced neural architectures for graph processing in the space of graphs. The problem of affordable computation is of primary relevance here whenever we consider large graphs: it follows, that the computational complexity issue must be a guideline when designing the computing architecture.
Scalability and learning. As the complexity of the architecture and the size of the data get larger, it is necessary to provide sound model selection techniques and performance evaluation criteria to ensure proper fitting. This is a crucial part that is too often improperly carried out in graph processing, due to the lack of statistical tools such as optimality criteria for graph predictors, as well as unbiasedness and consistency of the related graph-state estimator.

F.M. Bianchi, D. Grattarola, L. Livi, C.Alippi, Graph Neural Networks with Convolutional ARMA Filters, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021 IRIS
D.Grattarola, L. Livi, C. Alippi, Learning Graph Cellular Automata, NeurIPS 2021

Caption test

Security for Machine Learning, Machine Learning for security

Security for Machine Learning, Machine Learning for security
Security is both a key enabler and an extremely relevant application for machine learning.

Security as a key enabler, because the data used in the algorithms should handled in a privacy preserving way and because machine learning algorithms, when deployed in real world, must be protected from adversaries attempting to still their value and/or alter their effectiveness.
Security as an extremely relevant application, because security should take advantage from the use of machine learning techniques to identify and mitigate attacks. In the context of the Horizon 2020 projects CPSoSAware and EVEREST, IDSIA is currently exploring both aspects.

The research aims on the one side, at building methodologies, tools, and architectures for ensure security and privacy for machine learning applications, on the other at exploring the use of machine learning techniques to early detect attacks and malicious activity, ultimately to improve the resilience of systems.

The main research challenges are:

Conceive and experimentally validate methods to protect machine learning algorithms from physical and side channel attacks without affecting their performance (or, using a complementary approach, explore novel structures for machine learning algorithms more suitable to be protected against these attacks)
Design and validate algorithms and architectures to preserve privacy of data in machine learning. Solutions based on homorphic encryption and/or scalable federated learning are suitable candidates, but they should be improved to make them practical and their robustness against different attacks should be carefully assessed.
Design and practically validate suitable machine learning techniques to detect and react to attacks carried out in large scale data analytic applications. When real-time and limited energy footprint are strict requirements, the challenge is to ensure the effectiveness of the machine learning algorithms while minimally affecting the performance of the system.

Area leaders:
Cesare Alippi (USI)
Jürgen Schmidhuber (USI)
Marco Zaffalon (SUPSI)

SUPSI Image Focus

Research groups

Recurrent neural networks

Deep Learning Achievements

SUPSI Image Focus

Probabilistic forecasting

References

SUPSI Image Focus

Bayesian Networks (BNs)

References

SUPSI Image Focus

Causal analysis and Knowledge Engineering

References

SUPSI Image Focus

Graph and geometric deep learning

Learning high-order graph structures from data

Graph state-space neural models

References

SUPSI Image Focus

Security for Machine Learning, Machine Learning for security

Hidden Widget