Big Data
From Christoph's Personal Wiki
Revision as of 00:01, 26 February 2017 by Christoph (Talk | contribs) (Created page with "'''Big Data''' is a term for data sets that are so large or complex that traditional data processing application softwares are inadequate to deal with them. Challenges include...")
Big Data is a term for data sets that are so large or complex that traditional data processing application softwares are inadequate to deal with them. Challenges include capture, storage, analysis, data curation, search, sharing, transfer, visualization, querying, updating, and information privacy. The term "big data" often refers simply to the use of predictive analytics, user behaviour analytics, or certain other advanced data analytics methods that extract value from data, and seldom to a particular size of data set.
- Doug Laney's "4 V's of Big Data":[1]
- Volume
- Extremely large volumes of data (i.e., peta- or exa-bytes, as of February 2017)
- Variety
- Various forms of data (structured, semi-structured, and unstructured)
- Velocity
- Real-time (e.g., IoT, social media, sensors, etc.), batch, streams of data
- Veracity or variability
- Inconsistent, sometimes inaccurate, varying data
- Format of Big Data:
- Structured
- Data that has a defined length and format (aka "schema"). Examples include numbers, words, dates, etc. Easy to store and analyse. Often managed using SQL.
- Semi-structured
- Between structured and unstructured. Does not conform to a specific format, but is self-describing and involving simple key-value pairs. Examples include JSON, SWIFT (financial transactions), and EDI (healthcare).
- Unstructured
- Data that does not follow a specific format. Examples include audio, video, images, text messages, etc.
- Big Data Analytics:
- Basic analytics
- Reporting, dashboards, simple visualizations, slicing and dicing.
- Advanced analytics
- Complex analytics models using machine learning, statistics, text analytics, neural networks, data mining, etc.
- Operationalized analytics
- Embeded big data analytics in a business process to streamline and increase efficiency.
- Analytics for business decisions
- Implemented for better decision-making, which drives revenue.
Human- vs. machined-generated data.
- Hadoop
- A software ecosystem that enables massively parallel computations distributed across thousands of (commodity) servers in a cluster
References
- ↑ 4 Vs For Big Data Analytics. 2013-06-31.