Defesa de Tesa de Doutorado: A Machine Learning Model Selection Algorithm for Stream Data
-
Palestrantes
Aluno: Anderson Chaves da Silva
-
Informações úteis
Orientadores:
Fabio André Machado Porto - Laboratório Nacional de Computação Científica - LNCC
Banca Examinadora:
Fabio André Machado Porto - Laboratório Nacional de Computação Científica - LNCC (presidente)
Antônio Tadeu Azevedo Gomes - Laboratório Nacional de Computação Científica - LNCC
Eduardo Soares Ogasawara
Daniel Cardoso Moraes de Oliveira - Universidade Federal Fluminense - UFF
Patrick Valduriez - INRIA - FRA
Suplentes:
Gilson Antônio Giraldi - Laboratório Nacional de Computação Científica - LNCC
Resumo:Predictive queries over spatiotemporal (ST) stream data present substantial challenges in data processing andanalysis. ST data streams encompass a series of time-dependent data di stributions that vary across both space and time, often displaying distinct and dynamic patterns. Relying on a single machine learning model designed for a specific data distribution frequently leads to suboptimal outcomes, as such a model is unlikely to capture the diverse behaviors present in different spatiotemporal regions. Traditional ensemble methods, which aim to leverage the complementary strengths of multiple base models, tend to suffer from high computational costs and subpar performance when applied to ST data due to the complexity of integrating each model’s contributions effectively. Likewise, global models—trained on comprehensive datasets—are often inadequate, facing several challenges such as insufficient data, higher complexity, and the inefficiency of retraining when more specialized models are already available. To address these limitations, we propose an approach that optimizes predictive accuracy by considering both the training data and generalization errors of available models, as well as the target data distribution. For each time series, our method selects the most suitable model. Based on these principles, we developed StreamEnsemble, na innovative method for processing predictive queries over ST data that dynamically combines multiple candidate models. Experimental results demonstrate that StreamEnsemble significantly outperforms both traditional ensemble techniques and single-model approaches, reducing prediction error by more than tenfold while achieving faster execution times.
- Mais informações