Currently, Big data has changed the way many businesses work or operate. From healthcare to BI to HR to smart manufacturing, every business is trying to get the most out of its data and get more insight into its operation. However, no matter what the application is, almost 80 percent of the effort in data processing goes into making the data ready. What does that mean?
Let's start by breaking a big data flow into 3 main pieces:
raw data -> warehouse -> learning/mining
Every project starts from collecting raw data. The raw data usually comes as a massive amount of both structured and unstructured data. Moreover, the raw data usually is generated from multiple sources that are different in time, nature and format of data. It is hardly possible to get any insight or reasoning by just looking at the raw data. Significant amount of work is required to convert raw data into information (that traditionally is stored in data warehouse a.k.a information warehouse). Data preparation, integration, cleansing, transformation etc are all located in this step. This step makes data ready for mining and learning. Without this preparation, there can be no analysis. This preparation and integration usually requires 80% of the effort in the data project. Clearly, after the data is ready and located in warehouse, one can use machine learning and statistics to create intelligence and knowledge consumable by human.
raw data -> warehouse -> learning/mining
Every project starts from collecting raw data. The raw data usually comes as a massive amount of both structured and unstructured data. Moreover, the raw data usually is generated from multiple sources that are different in time, nature and format of data. It is hardly possible to get any insight or reasoning by just looking at the raw data. Significant amount of work is required to convert raw data into information (that traditionally is stored in data warehouse a.k.a information warehouse). Data preparation, integration, cleansing, transformation etc are all located in this step. This step makes data ready for mining and learning. Without this preparation, there can be no analysis. This preparation and integration usually requires 80% of the effort in the data project. Clearly, after the data is ready and located in warehouse, one can use machine learning and statistics to create intelligence and knowledge consumable by human.
For more information in this topic see the following:
http://www.informatica.com/Images/02342_preparing-for-big-data-journey_wp_en-US.pdf
some companies that do service for that 80 percent:
http://www.platfora.com/
http://www.pentaho.com/
http://www.trifacta.com/
http://www.informatica.com/Images/02342_preparing-for-big-data-journey_wp_en-US.pdf
some companies that do service for that 80 percent:
http://www.platfora.com/
http://www.pentaho.com/
http://www.trifacta.com/