Aurora Gonzalez is one of the PhDs funded in part by the IoTCrawler. She started her thesis: “Data analytics algorithms for IoT based smart environments” in November 2015. In December 2019 she will be presenting the result – a compendium of six articles, providing valuable contributions to the IoTCrawler.
Interview with Aurora Gonzalez, UMU
What excites you about your research?
It allows me to use my background in mathematics and enables me to solve real problems. We define a problem, find different sources and then work with them toward a solution. It is a very creative job. Developing and applying algorithms in machine learning, Big Data etc. in smart environments in order to improve service or create new ones that are more efficient. That is my field of research.
It can be for smart cities or smart buildings – physical environments in which IoT devices are embedded seamlessly. I investigate how to analyze the data that are collected in these smart environments and how to extract knowledge out of this. This includes both fundamental studies and applied problems. Sometimes, a particular problem helps us finding a gap in the theoretical field and other times we first develop a methodology that can then be applied to a problem.
Why did you choose this area for your thesis?
I studied mathematics and worked at a statistical consultancy connected to my university, when I decided to do a Ph.D. I always thought to follow a research path and they encouraged me to follow it.. I had it in mind and then my leader suggested I might do this, to be able to work with the computer science and deep learning I could not work with otherwise. At the consultancy we would design statistical experiments, do descriptive analytics and hypothesis testing for researchers of different fields. I learned about reproducible research and different programming practices, but it was mostly about statistical analysis. I saw a potential for the consultancy in working towards using complex algorithms or big data.
I was only there for one year and in industry it would be very difficult to work indepth in developing a new practice like this.
When the PhD opportunity came about it seemed like a natural prolongation of my capability to extract knowledge and do deep learning from data, but with a more focused topic. I could make a contribution to the field.
What are the major insights you and your research group has come across so far?
I would say the article about missing data integration in the IoT environment. It is very important.
We developed a methodology that combines low- and high-precision sensor information to estimate mission data in IoT in real time. Low precision sensors are quite common, especially now that the IoT technology is becoming widespread and it is important that you are able to determine the confidence they provide. Since the IoT systems require good quality sensor data for making real time decisions, we focus our research on how to allow full recovery and service continuity in the IoT-environment. Providing estimations of missing data is helping the systems to continue working in real time.
We also use a technique called Bayesian Maximum Entropy. As a Bayesian method, it includes a prior, a meta-prior and a posterior stage. The prior knowledge about the monitored area is obtained with variograms – a spatial analysis of the collected data – or, in other words, with the analysis of the relationship of data from neighboring sensors. After that, we use Bayesian conditionalization to integrate the data in different ways depending on whether it comes from high- or low-precision sensors. This produces a probability density function that is used to estimate the missing point. In other words, we use the relationships between sensors that are close to each other to estimate with high accuracy the missing value.
What makes the IoTCrawler special?
There are other projects that have created an engine for crawling data such as the search engine for internet connected services Shodan and Thingful, that are initiatives carried out in Europe. Shodan is considered somewhat dangerous in terms of possibilities for system attackers. Besides, both are restricted to data sharing and do not really provide a whole architecture that enables comprehensive searches in terms of models, platforms, data and analytics.
Does it have a particular value that the IoTCrawler is a European project?
In Europe we are concerned with privacy and have a strong focus on creating open data communities and a sharing philosophy. Without the open data philosophy, the benefits would be lower. If we are preoccupied with making data “ours” we will miss out creating value for public sectors and growing communities.
What are your aspirations and hopes for the IoTCrawler?
It has the potential of being very useful in terms of improving the researchers’ way of doing science and innovation. A lot of places researchers may not have the resources – the time, the tools or the connections – to optimize the work in IoT and the IoTCrawler will introduce this machine intelligence to the whole process and make it optimal.
After I present my thesis I will still be working in this field and I hope to benefit from the outcome of the IoTCrawler. It will deliver an architecture for interacting with data of IoT in an efficient way. At the moment a large part of the researcher’s work is spent on integration and not in discovery and novel insights. It is not secure enough to be realized in a traditional engine since IoT has evolved to a point where IoTCrawler was necessary in order to overcome the dynamic behavior of systems. Once the IoTCrawler exists, we could use machine intelligence for discovering data and models that are useful for any domain.
A good work practice would be to always automatize repetitive tasks and optimize productivity and in the end the IoTCrawler will deliver the means to boost data driven solutions, which is why I expect to benefit from the outcomes of this project.