Abstract
The past couple of decades have witnessed exponential growth in data, due to the penetration of information technology across all aspects of science and society; the increasing ease with which we are able to collect more data; and the growth of Internet-scale, planet-wide Web-based and mobile services—leading to the notion of “big data”. While the emphasis so far has been on developing technologies to manage the volume, velocity, and variety of the data, and to exploit available data assets via machine learning techniques, going forward the emphasis must also be on translational data science and the responsible use of all of these data in real-world applications. Data science in the 21st century must provide trust in the data and provide responsible and trustworthy techniques and systems by supporting the notions of transparency, interpretability, and reproducibility. The future offers exciting opportunities for transdisciplinary research and convergence among disciplines—computer science, statistics, mathematics, and the full range of disciplines that impact all aspects of society. Econometrics and economics can find an important role in this convergence of ideas.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Data and Society, https://datasociety.net/.
- 2.
Convergence Research at NSF: https://www.nsf.gov/od/oia/convergence/index.jsp.
- 3.
- 4.
- 5.
TFoDS Workshop, http://www.cs.rpi.edu/TFoDS/.
- 6.
TFoDS Workshop Report, http://www.cs.rpi.edu/TFoDS/TFoDS_v5.pdf.
- 7.
Apache MXnet, https://aws.amazon.com/mxnet/.
- 8.
Google Tensorflow, https://www.tensorflow.org/.
- 9.
Microsoft Cognitive Toolkit, https://www.microsoft.com/en-us/cognitive-toolkit/.
- 10.
IBM Cognitive Computing, https://www.ibm.com/it-infrastructure/us-en/cognitive-computing/.
- 11.
Administration Issues Strategic Plan for Big Data Research and Development, https://obamawhitehouse.archives.gov/blog/2016/05/23/administration-issues-strategic-plan-big-data-research-and-development.
- 12.
Structured Query Language, https://www.w3schools.com/sql/sql_intro.asp.
- 13.
noSQL Databases, http://nosql-database.org/.
- 14.
Apache Hadoop, http://hadoop.apache.org/.
- 15.
Apache Spark, https://spark.apache.org/.
- 16.
Apache Storm, http://storm.apache.org/.
- 17.
Personal communication with R.V. Guha, July 2016.
- 18.
NITRD Big Data Interagency Working Group (BDIWG), https://www.nitrd.gov/nitrdgroups/index.php?title=Big_Data.
- 19.
Translational Data Science workshop, https://cdis.uchicago.edu/tds-17/.
- 20.
Translational Research, wikipedia, https://en.wikipedia.org/wiki/Translational_research.
- 21.
National Center for Advancing Translational Science, https://ncats.nih.gov/.
References
Abel, P.: Cobol Programming: A Structured Approach. Prentice Hall, Upper Saddle River (1988)
Abiteboul, S., Miklau, G., Stoyanovich, J., Weikum, G.: Data, Responsibly. Seminar 16291, Dagstuhl, 17–22 July 2016. http://www.dagstuhl.de/16291
ACM: Artifact Review and Badging, June 2016. https://www.acm.org/publications/policies/artifact-review-badging
Ball, N.M., Brunner, R.J.: Data Mining and Machine Learning in Astronomy, arxiv.org, August 2010. https://arxiv.org/abs/0906.2173
CCC Blog, Obama Administration Unveils $200M Big Data R&D Initiative, 29 March 2012. http://www.cccblog.org/2012/03/29/obama-administration-unveils-200m-big-data-rd-initiative/
Codd, E.F.: The Relational Model for Database Management (Version 2 ed.). Addison Wesley Publishing Company (1990). ISBN 0-201-14192-2
Economist: The Data Deluge, February 2010. http://www.economist.com/node/15579717
Groves, R.: “Designed Data” and “Organic Data”, May 2011. https://www.census.gov/newsroom/blogs/director/2011/05/designed-data-and-organic-data.html
Hey, T., Tansley, S., Tolle, K.: The Fourth Paradigm: Data-Intensive Scientific Discovery. Microsoft Research (2009). ISBN 978-0-9825442-0-4
Kakade, S., Harchaoui, Z., Drusvyatskiy, D., Lee, Y.T., Fazel, M.: Algorithms for data science: complexity, scalability, and robustness (2017). https://nsf.gov/awardsearch/showAwardAWD_ID=1740551&HistoricalAwards=false
Mahoney, M.W.: Lecture Notes on Randomized Linear Algebra, arXiv:1608.04481, August 2016
National Academy of Sciences, Arthur M. Sackler Colloquia: Reproducibility of research: issues and proposed remedies. http://www.nasonline.org/programs/sackler-colloquia/completed_colloquia/Reproducibility_of_Research.html
National Academy of Sciences: Refining the Concept of Scientific Inference When Working With Big Data: A Workshop, June 2016. http://sites.nationalacademies.org/DEPS/BMSA/DEPS_171738
NITRD Big Data Interagency Working Group: The Federal Big Data R&D Strategic Plan, May 2016. https://obamawhitehouse.archives.gov/sites/default/files/microsites/ostp/NSTC/bigdatardstrategicplan-nitrd_final-051916.pdf
NITRD Big Data Interagency Working Group: 3rd Workshop on an Open Knowledge Network (2017). https://www.nitrd.gov/nitrdgroups/index.php?title=Open_Knowledge_Network
O’Neil, C.: Weapons of Math Destruction. Crown Publishing, New York (2016)
Papalexakis, E.E., Kang, U., Faloutsos, C., Sidiropoulos, N.D., Harpale, A.: Large scale tensor decompositions: algorithmic developments and applications. IEEE Data Eng. Bull. - Special Issue on Social Media 36, 59 (2013)
Sato, K., Young, C., Patterson, D.: An in-depth look at Google’s first Tensor Processing Unit (TPU), May 2017. https://cloud.google.com/blog/big-data/2017/05/an-in-depth-look-at-googles-first-tensor-processing-unit-tpu
Schork, N.: Personalized medicine: time for one-person trials. Nature 520(7549), 609–611 (2015). https://doi.org/10.1038/520609a. https://www.nature.com/news/personalized-medicine-time-for-one-person-trials-1.17411
Shiffrin, R.M.: Drawing causal inference from Big Data, vol. 113, no. 27, pp. 7308–7309 (2016). https://doi.org/10.1073/pnas.1608845113
Suciu, D., Balazinska, M., Howe, B.: A formal foundation for big data management. https://nsf.gov/awardsearch/showAward?AWD_ID=1247469&HistoricalAwards=false
NSF: Core Techniques and Technologies for Advancing Big Data Science & Engineering (BIGDATA) (2012). https://www.nsf.gov/pubs/2012/nsf12499/nsf12499.htm
Upfal, E.: Analytical approaches to massive data computation with applications to genomics (2012). https://nsf.gov/awardsearch/showAward?AWD_ID=1247581&HistoricalAwards=false
Varian, H.R.: Big data: new tricks for econometrics. J. Econ. Perspect. 28(2), 3–28 (2014). https://doi.org/10.1257/jep.28.2.3. http://www.aeaweb.org/articles?id=10.1257/jep.28.2.3
Viotti, P., Vukolic, M.: Consistency in non-transactional distributed storage systems. ACM Comput. Surv. 49(1), 19:1–19:34 (2016). https://doi.org/10.1145/2926965
Weinberger, K., Strogatz, S., Hooker, G., Kleinberg, J., Shmoys, D.: Data science for improved decision-making: learning in the context of uncertainty, causality, privacy, and network structures (2017). https://nsf.gov/awardsearch/showAward?AWD_ID=1740822&HistoricalAwards=false
Xing, E.P., Ho, Q., Dai, W., Kim, J.K., Wei, J., Lee, S., Zheng, X., Xie, P., Kumar, A., Yu, Y.: Petuum: a new platform for distributed machine learning on big data. IEEE Trans. Big Data 1, 49 (2015). https://doi.org/10.1109/TBDATA.2015.2472014
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Baru, C. (2018). Data in the 21st Century. In: Kreinovich, V., Sriboonchitta, S., Chakpitak, N. (eds) Predictive Econometrics and Big Data. TES 2018. Studies in Computational Intelligence, vol 753. Springer, Cham. https://doi.org/10.1007/978-3-319-70942-0_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-70942-0_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-70941-3
Online ISBN: 978-3-319-70942-0
eBook Packages: EngineeringEngineering (R0)