Big Data
From Christoph's Personal Wiki
Big Data is a term for data sets that are so large or complex that traditional data processing application softwares are inadequate to deal with them. Challenges include capture, storage, analysis, data curation, search, sharing, transfer, visualization, querying, updating, and information privacy. The term "big data" often refers simply to the use of predictive analytics, user behaviour analytics, or certain other advanced data analytics methods that extract value from data, and seldom to a particular size of data set.
- Doug Laney's "4 V's of Big Data":[1]
- Volume
- Extremely large volumes of data (i.e., peta- or exa-bytes, as of February 2017)
- Variety
- Various forms of data (structured, semi-structured, and unstructured)
- Velocity
- Real-time (e.g., IoT, social media, sensors, etc.), batch, streams of data. Is usually either human- or machined-generated data.
- Veracity or variability
- Inconsistent, sometimes inaccurate, varying, or missing data
- Format of Big Data:
- Structured
- Data that has a defined length and format (aka "schema"). Examples include numbers, words, dates, etc. Easy to store and analyse. Often managed using SQL.
- Semi-structured
- Between structured and unstructured. Does not conform to a specific format, but is self-describing and involving simple key-value pairs. Examples include JSON, SWIFT (financial transactions), and EDI (healthcare).
- Unstructured
- Data that does not follow a specific format. Examples include audio, video, images, text messages, etc.
- Big Data Analytics:
- Basic analytics
- Reporting, dashboards, simple visualizations, slicing and dicing.
- Advanced analytics
- Complex analytics models using machine learning, statistics, text analytics, neural networks, data mining, etc.
- Operationalized analytics
- Embedded big data analytics in a business process to streamline and increase efficiency.
- Analytics for business decisions
- Implemented for better decision-making, which drives revenue.
- What is IoT?
- Internet of Things
- Physical objects that are connected to the Internet
- Identified by an IP address (IPv4 now; IPv6 in the future)
- Devices communicate with each other and other Internet-enabled devices and systems
- Includes everyday devices that utilize embedded technology to communicate with an external environment by connecting to the Internet
- IoT data is high volume, high velocity, high variety, and high veracity
- Examples of IoT:
- Security systems
- Thermostats (e.g., Nest)
- Vehicles
- Electronic appliances
- Smart-lighting in households or commercial buildings (e.g., Philips Hue)
- Fitness devices (e.g., Fitbit)
- Sensors to measure environmental parameters (e.g., temperature, humidity, wind, etc.)
- Hadoop
- A software ecosystem that enables massively parallel computations distributed across thousands of (commodity) servers in a cluster
References
- ↑ 4 Vs For Big Data Analytics. 2013-06-31.