Data warehouse where to start
Discussion
This is a huge subject and it does depend on what your company uses right now, what experience they have and if they are prepared to hire.
While PowerBI would point to Azure/Microsoft, that isn't strictly necessary so you can largely use anything you want.
The main things to start with are - what are your data sources, what technology do they use and where are they (on-prem, cloud etc) and lastly, what skills do you and the business have?
How difficult that is depends on the data sources etc. You need to also think about how you join the data. What the data IS. Can you join your financials to your product type or media data or will you need to create some form of intermediate mapping?
What you want from this reporting needs to be decided on before you start as well. Do not keep adding requirements as it will never end. Aim for x,y,z reports to start and then go for another round of new things need to be added. This must be held to firmly, and with a complete requirements document.
The how is more important than the 'where', but if you have DBA's (for example) well versed in PostgreSQL, start with that. The same with SQL Server. I am making an assumption you do not have data engineers, but those things can come later. A simple warehouse can be 1 or more databases on a database instance that get aggregated and joined together for reporting.
A data lake can come after (if necessary), as a simple database would give you the fundamentals of what you want AND show the business the advantages.
While PowerBI would point to Azure/Microsoft, that isn't strictly necessary so you can largely use anything you want.
The main things to start with are - what are your data sources, what technology do they use and where are they (on-prem, cloud etc) and lastly, what skills do you and the business have?
How difficult that is depends on the data sources etc. You need to also think about how you join the data. What the data IS. Can you join your financials to your product type or media data or will you need to create some form of intermediate mapping?
What you want from this reporting needs to be decided on before you start as well. Do not keep adding requirements as it will never end. Aim for x,y,z reports to start and then go for another round of new things need to be added. This must be held to firmly, and with a complete requirements document.
The how is more important than the 'where', but if you have DBA's (for example) well versed in PostgreSQL, start with that. The same with SQL Server. I am making an assumption you do not have data engineers, but those things can come later. A simple warehouse can be 1 or more databases on a database instance that get aggregated and joined together for reporting.
A data lake can come after (if necessary), as a simple database would give you the fundamentals of what you want AND show the business the advantages.
Edited by Monsterlime on Monday 13th January 18:06
petemurphy said:
Sooo
Mid size company we have many different systems that we want power bi reporting on.
In my head all data needs to go somewhere central then we report on that.
I agree with Monsterlime's points.Mid size company we have many different systems that we want power bi reporting on.
In my head all data needs to go somewhere central then we report on that.
Additionally, in many companies I have worked with (very big ones, stuffed with experts) the activity to "put all data somewhere central" becomes the end in itself, and everyone loses sight of the "why" i.e. the reporting etc. As a nimble mid-size company, try avoid moving any data if you can. KISS.
Most "data lakes" have very murky depths.
What are the reporting requirements? I presume they are numerous and transcend multiple subject areas across multiple systems.
Get a full list and prioritise with the business also understanding the complexity of each.
Build some sort of high-level model showing how it all might fit together (you can expand this out as you deliver capability)
Do a couple of easy ones first (Source - ETL or ELT and Interrogate) to prove the concept and learn the ropes and, most importantly, convince the business you can do it.
Data Lake - Most are no more than an unregulated dump (murky depths certainly describes what it could turn into - If used properly it does have potential)
I've been working in this area since before the term Data Warehouse was coined - As said earlier, it can be a very massive and daunting undertaking - Nibble away and grow it gradually.
Get a full list and prioritise with the business also understanding the complexity of each.
Build some sort of high-level model showing how it all might fit together (you can expand this out as you deliver capability)
Do a couple of easy ones first (Source - ETL or ELT and Interrogate) to prove the concept and learn the ropes and, most importantly, convince the business you can do it.
Data Lake - Most are no more than an unregulated dump (murky depths certainly describes what it could turn into - If used properly it does have potential)
I've been working in this area since before the term Data Warehouse was coined - As said earlier, it can be a very massive and daunting undertaking - Nibble away and grow it gradually.
biggiles said:
I agree with Monsterlime's points.
Additionally, in many companies I have worked with (very big ones, stuffed with experts) the activity to "put all data somewhere central" becomes the end in itself, and everyone loses sight of the "why" i.e. the reporting etc. As a nimble mid-size company, try avoid moving any data if you can. KISS.
Most "data lakes" have very murky depths.
Oh goodness yes. I have seen multimillion dollar projects to dump everything somewhere and nobody does anything with it because either nobody knows it exists, nobody knows what to do with it or how to use it.Additionally, in many companies I have worked with (very big ones, stuffed with experts) the activity to "put all data somewhere central" becomes the end in itself, and everyone loses sight of the "why" i.e. the reporting etc. As a nimble mid-size company, try avoid moving any data if you can. KISS.
Most "data lakes" have very murky depths.
Then you just end up with massive data duplication, no standardisation and shadow IT doing their own reporting elsewhere completely defeating the purpose.
Sigh.
Generally you only need to pull the data into a central location if you are going to be doing lots of processing on it that could impact real time operations.
What sort of data are you talking about and where is it currently, how ‘real time’ do you need the analysis, what output are you trying to get and for what benefit?
As already said keeping it simple may well be the secret to success.
What sort of data are you talking about and where is it currently, how ‘real time’ do you need the analysis, what output are you trying to get and for what benefit?
As already said keeping it simple may well be the secret to success.
Storing data from multiple sources together doesn't magically make it into a common structure that can be queried across.
Unless you are a data engineer then the advantage of having unrelated data all in one place as a data warehouse or datalake is that it will be easier to delete.
So if you don't have one, don't have any data engineers who want one, don't create one.
If you think it will make it easier to manage, it won't, because now instead of having data in many different operational systems, you'll have data in those many systems, plus 1 more and many different data pipelines to manage.
Power BI is very capable of handling multiple data sources and can do a fair amount of combining data where it is being presented,
Wait until you are pushing the limits of what you can achieved in power BI, then if you do you will have some specific requirements to steer you decisions on building out your data infrastructure.
Unless you are a data engineer then the advantage of having unrelated data all in one place as a data warehouse or datalake is that it will be easier to delete.
So if you don't have one, don't have any data engineers who want one, don't create one.
If you think it will make it easier to manage, it won't, because now instead of having data in many different operational systems, you'll have data in those many systems, plus 1 more and many different data pipelines to manage.
Power BI is very capable of handling multiple data sources and can do a fair amount of combining data where it is being presented,
Wait until you are pushing the limits of what you can achieved in power BI, then if you do you will have some specific requirements to steer you decisions on building out your data infrastructure.
Thanks all very interesting.
The main system I worry about is a v old till “erp” system that runs on sql that when they tried a bi report made it run even slower. Other bits are more industry standard like oracle opera.
We do eventually want a proper erp like dynamics would that use a warehouse?
Where does dataverse come in if at all?
Thanks
The main system I worry about is a v old till “erp” system that runs on sql that when they tried a bi report made it run even slower. Other bits are more industry standard like oracle opera.
We do eventually want a proper erp like dynamics would that use a warehouse?
Where does dataverse come in if at all?
Thanks
If you are planning to move to a new ERP I'd firm that up and make sure your reporting requirements are in your system selection requirements. New ERP's are usually better than old ones due to more efficient data storage, better write/read etc.... You can spend a shedload of money on data warehouses and the tools to extract and display the data contained within, which could have been spent on a, well, ERP system, and gotten you what you need.
If you're shopping for a new ERP do as much research as you can yourself. Know what you need it to do and then research the capabilities of whatever systems you're thinking of (or vendors and system integrators) are throwing at you.
If you're shopping for a new ERP do as much research as you can yourself. Know what you need it to do and then research the capabilities of whatever systems you're thinking of (or vendors and system integrators) are throwing at you.
As a simple starter for ten that can use existing data sources, Tableau is very fast to get off the ground and pretty powerful.
Not sure what they're like these days but Zoho Reports is another.
As an aside, when presenting and collating data, don't forget to ask "So what?". Pulling, grouping and displaying data is the easy bit.
Not sure what they're like these days but Zoho Reports is another.
As an aside, when presenting and collating data, don't forget to ask "So what?". Pulling, grouping and displaying data is the easy bit.

I've been in the Data warehousing, BI, MI, Analytics and Data Science arena for my entire career (> 25 years), as others have said, there's no one-size-fits-all solution. My advice echoes others, keep it simple!
The best advice that is often ignored is to fix any data issues at source rather than bringing together multiple sources and attempting to fix the problems. Compounding the issue is a common mistake and is costly/time-consuming to remedy. A data lake often becomes a data swamp in my experience.
Don't go for a big bang approach, start simple, and use Power BI or similar to access the data at source where possible rather than build a DW. This will quickly highlight any data quality issues and give you a better idea of what's needed as you add data from other sources. Data warehousing projects often fail due to data quality and the time it takes to build. By the time the project is delivered, the business requirements have changed, and it can be clunky to re-engineer to meet the new requirements.
User adoption fails due to a lack of trust in the data. The business then reverts to departmental spreadsheets with layers of fiddle factors and full of human errors, and employees then spend half of their time in meetings arguing about whose numbers are correct rather than making decisions based on trusted data.
The best advice that is often ignored is to fix any data issues at source rather than bringing together multiple sources and attempting to fix the problems. Compounding the issue is a common mistake and is costly/time-consuming to remedy. A data lake often becomes a data swamp in my experience.
Don't go for a big bang approach, start simple, and use Power BI or similar to access the data at source where possible rather than build a DW. This will quickly highlight any data quality issues and give you a better idea of what's needed as you add data from other sources. Data warehousing projects often fail due to data quality and the time it takes to build. By the time the project is delivered, the business requirements have changed, and it can be clunky to re-engineer to meet the new requirements.
User adoption fails due to a lack of trust in the data. The business then reverts to departmental spreadsheets with layers of fiddle factors and full of human errors, and employees then spend half of their time in meetings arguing about whose numbers are correct rather than making decisions based on trusted data.
^^^ all good advice above.
Create some basic, well defined report requirements that management want to start with. No scope creep. What information are they looking for tied to what business outcomes and KPIs? Build the project into phases so that you will have clear milestones and deliverables.
Manage expectations upwards. Even well curated data from seemingly simple ERP systems will have quirks that can skew reports.
Once you understand your data landscape and requirements, you might benefit from a external consultant for a short period to validate your thinking and/or develop some alternative approaches. A good consultant will help you shave a lot of time off your project, but don't become dependent on them
From the above you can then consider the technical solutions (choose something well supported rather than niche with fewer developers; many will give you some free credits to try before you buy). Develop a basic MVP (minimal viable product) before considering the (now massive) list of additional reports that management now see can be created...
Create some basic, well defined report requirements that management want to start with. No scope creep. What information are they looking for tied to what business outcomes and KPIs? Build the project into phases so that you will have clear milestones and deliverables.
Manage expectations upwards. Even well curated data from seemingly simple ERP systems will have quirks that can skew reports.
Once you understand your data landscape and requirements, you might benefit from a external consultant for a short period to validate your thinking and/or develop some alternative approaches. A good consultant will help you shave a lot of time off your project, but don't become dependent on them

From the above you can then consider the technical solutions (choose something well supported rather than niche with fewer developers; many will give you some free credits to try before you buy). Develop a basic MVP (minimal viable product) before considering the (now massive) list of additional reports that management now see can be created...
KennyN said:
User adoption fails due to a lack of trust in the data. The business then reverts to departmental spreadsheets with layers of fiddle factors and full of human errors, and employees then spend half of their time in meetings arguing about whose numbers are correct rather than making decisions based on trusted data.
100%.Plus the data from the ERP may have "features" built in that are invisible to the end user... some POS function might create multiple data entries to achieve a function that was not initially considered and a direct look at the data might not make sense at first.
And that's before you consider that different sources might be using different units... the classic one being an engineer that ordered 300m of wire on thr ordering system and was called to ask where he wanted the 300 miles (rather than metres) delivered to...
one other point to think about is that if you have existing reports that contain the data you are interested in you can pull those together into power bi and then use that to present the data in a much more readable fashion. this can be combined with excel sheets to measure KPI's, performance against budget and all sorts of actually useful views of the data without needing a full on data warehouse project.
Chimune said:
Sorry if missing something, but as a midsize org with lots of data, dont you have a IT dept to give you advice on this?
A very good question. One possible answer is "yes, but I don't trust them", and that doesn't have to be as negative as it seems. Sometimes they're an ancillary part of the business, treated entirely as a cost centre, and not expected to innovate or think strategically about the business. HR often gets treated that way too. In a lot of businesses these are wasted opportunities, but far from fatal.
Gassing Station | Computers, Gadgets & Stuff | Top of Page | What's New | My Stuff