Interview with Data Scientist Stefaniia Legostaieva, AGT International
What is your role in developing the AGT demo?
We already worked with the Smart Home testbed before the IoTCrawler project started and a lot of the knowledge and user insights we started to gather early on from this context. We work as a team; Martin, Pavel, myself and some student supporters and we each have our area of expertise. We keep each other updated on our tasks, we share ideas and possible solutions for our individual challenges and then we implement these ideas. Sometimes it is necessary to set more time aside to work on a problem and if so, we will have brainstorm in the group to envision how to crack it.
For the SmartConnect – which is our application MVP of the IoTCrawler – we are working with my contribution as a Data Scientist. With the help of this demo we want to show the possibility of improving and speeding up the process of data integration of IoT devices into the IoTCrawler infrastructure.
The demo consists of two main tasks. The first is to discover the Smart Home gateway and the second is to extract connected sensors. As soon as sensors are discovered and the data can be accessed, the challenge is to understand what kind of sensors we are dealing with, where they are located and what kind of measurements it can sense, this is where I step in.
What is important in this process is to automatically or semi-automatically detect what kind of sensor is being viewed – for humans this is easy, for a machine it needs to be modeled in a way. In order to teach a machine to extract semantic representation from the data, we need examples – and lots of it – to create a model that will extract semantic representation from the sensor’s description, and in doing this we look at the metadata of the device: name of the device, metadata from the gateway, sensor’s encoding as well as user generated input. Given a lot of examples from many different sensors we can create a model that can extract type, location, and appliance information from the metadata of a sensor.
How did you acquire the data you needed to do that?
For this project we already had access to data from some Smart Home gateways via another project – GrowSmarter, and we used the data and the knowledge that we have acquired during this project as a starting point. But we needed much more for the IoTCrawler project, so we have released it to a bigger community and currently we have around 30 Smart Home gateways which provide data from around 1500 sensors. The users that participate in our testbed are people who are technology eager Smart Home users, who are already using Smart Home technology extensively, so these users have quite a few sensors connected – up to 130 sensors per home. There are a variety of devices in Smart Home environments that range from Smart Plugs, Motion Sensors, Humidity Sensors, Weather Stations, Thermostats to name some.
Would it be accurate to say that you are working with something that hasn’t been explored or described up front?
Yes, that’s right. It is very interesting to tackle something that hasn’t been implemented or even done before, applying experimental and innovative approaches. We’ve already had to think about not only making it work but making it useful. To answer questions such as: What is realistic? What is the best possible solution given the resources we have?
It is challenging to work both with exploring the potential and taking into consideration the domain restrictions, and at the same time, we also want to showcase a lot of features and ideas that we came up during the exploration phase.
What are your aspirations and hopes for the IoTCrawler?
The attempt to influence the standardization of the IoT domain is what matters the most to me. To pivot the domain into more semantic representation of data for different businesses that use or want to incorporate it into their business. The way I see it, the most value comes from extracting knowledge from the huge amounts of IoT data we have, with the help of machine learning techniques.
Usually data of this nature is heterogenous, it comes from many providers and is not structured. The most value, from my point of view, is to extract and structure data in a more systematic way, which will lead to the re-usability of data and will boost applications development.
Do you have any advice for others who wish to build new demos?
Keep in mind that this architecture is so new and innovative, so start small and simple at first and add features one at a time, to gain the most benefit and as of now you will be part of developing an innovative solution.