- Book Name: An Introduction to Data Everything You Need to Know About Big Data by Francesco Corea
- Author: Francesco Corea
- Pages: 131
- Size: 3 MB
An Introduction to Data by Francesco Corea PDF
Contents of An Introduction to Data by Francesco Corea PDF
- Introduction to Data
- Big Data Management: How Organizations Create and Implement Data Strategies
- Introduction to Artificial Intelligence
- AI Knowledge Map: How to Classify AI Technologies
- Advancements in the Field
- AI Business Models
- Hiring a Data Scientist
- AI and Speech Recognition
- AI and Insurance
- AI and Financial Services
- AI and Blockchain
- New Roles in AI
- AI and Ethics
- AI and Intellectual Property
- AI and Venture Capital
- A Guide to AI Accelerators and Incubators
- Appendix A: Nomenclature for Managers
- Appendix B: Data Science Maturity Test
- Appendix C: Data Scientist Extended Skills List (Examples in Parentheses) .
- Appendix D: Data Scientist Personality Questionnaire
Preface of An Introduction to Data by Francesco Corea PDF
There are many ways to define what big data is, and this is probably why it still remains a really difficult concept to grasp. Today, someone describes big data as dataset above a certain threshold, e.g., over a terabyte (Driscoll 2010), others as data that crash conventional analytical tools like Microsoft Excel. More renowned works though identified big data as data that display features of Variety, Velocity, and Volume (Laney 2001; McAfee and Brynjolfsson 2012; IBM 2013; Marr 2015).
Even though they are all partially true, there is a definition that seems to better capture this phenomenon (Dumbill 2013; De Mauro et al. 2015; Corea 2016): big data analytics is an innovative approach that consists of different technologies and processes to extract worthy insights from low-value data that do not fit, for any reason, the conventional database systems. In the last few years the academic literature on big data has grown extensively (Lynch 2008). It is possible to find specific applications of big data to almost any field of research (Chen et al. 2014).
For example, big data applications can be found in medical-health care (Murdoch and Detsky 2013; Li et al. 2011; Miller 2012a, b); biology (Howe et al. 2008); governmental projects and public goods (Kim et al. 2014; Morabito 2015); financial markets (Corea 2015; Corea and Cervellati 2015). In other more specific examples, big data have been used for energy control (Moeng and Melhem 2010), anomaly detection (Baah et al. 2006), crime prediction (Mayer-Schönberger and Cukier 2013), and risk management (Veldhoen and De Prins 2014). No matter what business is considered, big data are having a strong impact on every sector: Brynjolfsson et al. (2011) proved indeed that a data-driven business performs between 5 and 6% better than its competitors.
Other authors instead focused their attention on organizational and implementation issues (Wielki 2013; Mach-Król et al. 2015). Marchand and Peppard (2013) indicated five guidelines for a successful big data strategy:
(i) placing people at the heart of Big Data initiatives;
(ii) highlighting information utilization to unlock value;
(iii) adding behavioral scientists to the team;
(iv) focusing on learning; and
(v) focusing more on business problems than technological ones.
Barton and Court (2012) on the other hand identified three different key features for exploiting big data potential: choosing the right data, focusing on biggest driver of performance to optimize the business, and transforming the company’s capabilities. Data are quickly becoming a new form of capital, a different coin, and an innovative source of value. It has been mentioned above how relevant it is to channel the power of the big data into an efficient strategy to manage and grow the business.
But it is also true that big data strategies may not be valuable for all businesses, mainly because of structural features of the business/company itself. However, it is certain that a data strategy is still useful, no matter the size of your data. Hence, in order to establish a data framework for a company, there are first of all few misconceptions that need to be clarified: i) More data means higher accuracy.
Not all data are good quality data, and tainting a dataset with dirty data could compromise the final products. It is similar to a blood transfusion: if a non-compatible blood type is used, the outcome can be catastrophic for the whole body. Secondly, there is always the risk of overfitting data into the model, yet not derive any further insight—“if you torture the data enough, nature will always confess” (Coase 2012).
In all applications of big data, you want to avoid striving for perfection: too many variables increase the complexity of the model without necessarily increasing accuracy or efficiency. More data always implies higher costs and not necessarily higher accuracy. Costs include: higher maintenance costs, both for the physical storage and for model retention; greater difficulties in calling the shots and interpreting the results; more burdensome data collection and timeopportunity costs.
Undoubtedly the data used do not have to be orthodox or used in a standard way—and this is where the real gain is locked in—and they may challenge the conventional wisdom, but they have to be proven and validated. In summary, smart data strategies always start from analyzing internal datasets, before integrating them with public or external sources.
An introduction to data by francesco corea pdf.