Database structure / type - sensor data

Database structure / type - sensor data

Author
Discussion

toohuge

Original Poster:

3,449 posts

222 months

Thursday 1st September 2022
quotequote all
All,

I am working on a project at the moment that will likely form part of my masters thesis. I am looking at sensor data from several sites, and each site has several sensors (and this will likely change over time too) and the sensor data at the moment is recorded every 15 minutes.

I am looking at a way to collect this data and perform analysis on this and gain some insight into each site performance and how they compare with each other within each site and eventually, across sites as more come online.

I have a small amount of experience using Firebase / Firestore and I have created a DB model in Firestore with the following structure (using documents and sub collections etc):

Site->property->date->time->sensor1 / sensor2

This looks great, however it does perform a high number of document rights and in the long run, will be expensive and data there's a lack of built in / convenient data analytic tools whilst the data is in this store. But should be easier to use if we go down the route of an app based interface in the future for dashboarding.

I am considering the following technologies:

mongoDB
Neo4j
Amazon Neptune

I was looking at mongoDB and as a lot of my data is relational, it will require using an ancestor schema (I think) and even then, mongoDB is not a native graph db and it does not seem to be the best option, if other, more suitable technologies are available.

Neo4j and Amazon Neptune both look interesting and from what I understand, a graph db is a more appropriate tool to use given all of the readings are related to to time, date, property and site location.

Long post, but I was hoping someone may be able to steer me in the right direction / away from a disaster!

Thanks
Chris

S13_Alan

1,336 posts

249 months

Thursday 1st September 2022
quotequote all
Sounds like a fun project, especially if you've never done anything like it before.

I'm struggling with the idea that a graph db is the best for this though, don't really see it.

Can see how some type of document db like mongo (or perhaps better a messaging queue, or something like Kafka or Redis) might work as an initial destination for collecting the data as it's sent from the sensor if it's all different formats and response time matters, you just want it stored somehow.

Then, you'd have something else that runs which processes and collates the data coming in, having been stored temporarily in the "queue", putting in it's final destination and transforming it in a way that works for the type of reporting you want. Likely this goes into some type of relational db then.

For a small scale project there's no real right or wrong if you can find a way to justify it and you learn from the mistakes - things become a bit more difficult when it's for real, as you often would tend to stay on the safe side.

juice

8,772 posts

288 months

Thursday 1st September 2022
quotequote all
Splunk (Free) would ingest that data quite easily.

gavsdavs

1,203 posts

132 months

Thursday 1st September 2022
quotequote all
Splunk, or elasticsearch (god help you)

juice

8,772 posts

288 months

Thursday 1st September 2022
quotequote all
gavsdavs said:
Splunk, or elasticsearch (god help you)
Elastic - now there's a wormhole you don't wanna go down....

gavsdavs

1,203 posts

132 months

Thursday 1st September 2022
quotequote all
juice said:
Splunk (Free) would ingest that data quite easily.
Pro-tip. You can get a small (1gb/day I think but with all features enabled) license if you sign up for the developer program and say you're developing apps.
Means you can schedule searches/do auth/run deployment server blaa blaa if you wish.

toohuge

Original Poster:

3,449 posts

222 months

Friday 2nd September 2022
quotequote all
Thank you all for the recommendations, it is very much appreciated!

I realise I had an error in abstraction, attempting to make a nosql database relational.... even though they don't work in such ways....

I will look into Splunk and mongoDB - I've got all the data into mongoDB atlas and will have a play around today as well as looking at Splunk.

Thanks for the input!
Chris

eps

6,398 posts

275 months

Friday 2nd September 2022
quotequote all
Take a look at Prometheus and Grafana as well

dazmanultra

442 posts

98 months

Friday 2nd September 2022
quotequote all
I don't see why you need to use anything particularly special to store the data, it sounds a pretty simple set up and you're not even going to be storing that much either? Just use MySQL or PostgreSQL. You could use Amazon RDS or Digital Ocean's Managed Database products if you want to outsource the actual running of the underlying systems (so no sysadmin required), and it makes backups etc very straightforward as well.

Magnum 475

3,628 posts

138 months

Friday 2nd September 2022
quotequote all
I built something similar to what you’re describing for a manufacturing company a few years ago, with several hundred sensors in place. The IoT software vendor recommended Postgres as being the strongest performing platform for time series data, so would recommend you take a good look!

HappyMidget

6,788 posts

121 months

Friday 2nd September 2022
quotequote all
bks to all the st posted already, use Databricks and Delta tables. Can handle everything from the tiny amount of data you are using to the 5.5PB of telemetary data Apple process per day. mongoDB has caused a lot of headaches for some of my previous devs who reverting to SQL server after a trial run.

HappyMidget

6,788 posts

121 months

Friday 2nd September 2022
quotequote all
Magnum 475 said:
I built something similar to what you’re describing for a manufacturing company a few years ago, with several hundred sensors in place. The IoT software vendor recommended Postgres as being the strongest performing platform for time series data, so would recommend you take a good look!
I worked for one of the top 5 mining companies in the world with their sensor data. PG was nowhere near even being a candidate.

anonymous-user

60 months

Friday 2nd September 2022
quotequote all
mongodb and be done with it.

Read your data from your sensors somehow and fire it at an API that writes to a collection. Make up a JSON schema for your doc format and job done. Either use mongodb's own data API or write your own and host it on some free cloud service. mongo can be accessed by all sorts of tools now for analytics either directly or via APIs.

A measure every 15 min for each sensor is hardly going to need anything you will have to pay much for if anything at all.

HappyMidget

6,788 posts

121 months

Friday 2nd September 2022
quotequote all
MikeHo said:
mongodb and be done with it.

Read your data from your sensors somehow and fire it at an API that writes to a collection. Make up a JSON schema for your doc format and job done. Either use mongodb's own data API or write your own and host it on some free cloud service. mongo can be accessed by all sorts of tools now for analytics either directly or via APIs.

A measure every 15 min for each sensor is hardly going to need anything you will have to pay much for if anything at all.
Worst advice I have ever seen.

anonymous-user

60 months

Friday 2nd September 2022
quotequote all
HappyMidget said:
MikeHo said:
mongodb and be done with it.

Read your data from your sensors somehow and fire it at an API that writes to a collection. Make up a JSON schema for your doc format and job done. Either use mongodb's own data API or write your own and host it on some free cloud service. mongo can be accessed by all sorts of tools now for analytics either directly or via APIs.

A measure every 15 min for each sensor is hardly going to need anything you will have to pay much for if anything at all.
Worst advice I have ever seen.
Practical...... but clearly not what a top 5 mining company would use.

Olivera

7,581 posts

245 months

Friday 2nd September 2022
quotequote all
Pick a traditional relational database supporting SQL and forget about document databases. Misuse and overuse of document databases is unfortunately rife.

HappyMidget

6,788 posts

121 months

Friday 2nd September 2022
quotequote all
MikeHo said:
Practical...... but clearly not what a top 5 mining company would use.
Not what any company would use if they really want to analyse the data tbh. A document DB is not the correct data storage medium for sensor data. MSSQL would be far mor efficient on the query side. Delta is the middle ground for this. CosmosDB is never the right answer.

anonymous-user

60 months

Friday 2nd September 2022
quotequote all
HappyMidget said:
MikeHo said:
Practical...... but clearly not what a top 5 mining company would use.
Not what any company would use if they really want to analyse the data tbh. A document DB is not the correct data storage medium for sensor data. MSSQL would be far mor efficient on the query side. Delta is the middle ground for this. CosmosDB is never the right answer.
The guy's doing a thesis for a master's not writing a national anpr tracking system for the cops. Or a system for the actual top mining company.

I knew as soon as a 'programming/development' related topic appeared that it would become a nerd willy wave on the first page. Practical approaches thrown out for the sake of someone's ego.

Well done!!

HappyMidget

6,788 posts

121 months

Friday 2nd September 2022
quotequote all
Olivera said:
Pick a traditional relational database supporting SQL and forget about document databases. Misuse and overuse of document databases is unfortunately rife.
Schema on read is prob the worst thing ever and it is what created the data swamp.

Me, I prefer the data lakehouse with a rigid schema on write using delta.

HappyMidget

6,788 posts

121 months

Friday 2nd September 2022
quotequote all
Also just re-read your original post and it seems you are trying to mangle your data into a graph db for some reason. usually in the real world this is never the correct answer. Though in this case what you are looking at is IOT data. Nothing you have looked at is a good fit for this in my opinion, I would happily spend a few hours chatting to you and working through what you actually need and recommend something proper for this. Azure Edge/IOT could easily do this and some of my former colleagues are some of the best in the field for this. But never mongo. And one of my best friends is a dev at mongo.