The thesis concerns the use of big data in Official Statistics; the aim is to bring some new experimental studies into the big data literature. In particular, the purpose is to evaluate if and how big data could be used in Official Statistics. Only a few experiments exists on the use of big data for statistical purposes; it is a challenge task and a lot of experimentation is needed to find out evidence and solutions to use big data for statistical purposes. The analysis performed in the thesis goes into two different directions: 1. combining a traditional data source with a big data source to verify the potential of the latter to replicate official results; 2. analyzing a big data source per se and then trying to combine with an Official Statistics source to identify common patterns. The thesis initially proposes a literature review of definitions of big data and experiments, in particular concerning the use of the new sources combined with traditional data sources. Then, three original studies have been performed: the first two concern mobility in Lombardy region using mobile phone data. They both refer to the same issue (mobility patterns), but they differ in the traditional data source used: Origin/Destination matrix in the first case, an integrated version of the O/D matrix in the second. The objective of these two studies is trying to put in a unique interpretative framework one traditional statistical source and one typical kind of big data in order to evaluate some informative potentialities of this approach. In particular, we wanted to check if the two sources show common patterns, to evaluate future uses of the big data source in Official Statistics. The third study shows the pilot that was carried out during the traineeship I had the opportunity to attend at Eurostat, in collaboration with the Task Force Big Data. It concerns the use of Wikipedia, free online encyclopedia, for Tourism Statistics. The aim is to evaluate the use of Wikipedia page views as a source of information for the identification of factors that drive tourism to an area and whether it is possible to predict tourism flows using these data. A final chapter proposes conclusions and future remarks on the use of big data in Official Statistics. Two of the studies (the first on mobility patterns and the one on Wikipedia) have been or are being published, in a shorter and revised version. The three experiments show some potential in the use of big data in Official Statistics. The study needs more in-depth analysis, many more experiments and considerations will be necessary before we can achieve some definitive and convincing approaches.
(2017). The use of Big Data in Official Statistics [doctoral thesis - tesi di dottorato]. Retrieved from http://hdl.handle.net/10446/77376
The use of Big Data in Official Statistics
Signorelli, Serena
2017-05-31
Abstract
The thesis concerns the use of big data in Official Statistics; the aim is to bring some new experimental studies into the big data literature. In particular, the purpose is to evaluate if and how big data could be used in Official Statistics. Only a few experiments exists on the use of big data for statistical purposes; it is a challenge task and a lot of experimentation is needed to find out evidence and solutions to use big data for statistical purposes. The analysis performed in the thesis goes into two different directions: 1. combining a traditional data source with a big data source to verify the potential of the latter to replicate official results; 2. analyzing a big data source per se and then trying to combine with an Official Statistics source to identify common patterns. The thesis initially proposes a literature review of definitions of big data and experiments, in particular concerning the use of the new sources combined with traditional data sources. Then, three original studies have been performed: the first two concern mobility in Lombardy region using mobile phone data. They both refer to the same issue (mobility patterns), but they differ in the traditional data source used: Origin/Destination matrix in the first case, an integrated version of the O/D matrix in the second. The objective of these two studies is trying to put in a unique interpretative framework one traditional statistical source and one typical kind of big data in order to evaluate some informative potentialities of this approach. In particular, we wanted to check if the two sources show common patterns, to evaluate future uses of the big data source in Official Statistics. The third study shows the pilot that was carried out during the traineeship I had the opportunity to attend at Eurostat, in collaboration with the Task Force Big Data. It concerns the use of Wikipedia, free online encyclopedia, for Tourism Statistics. The aim is to evaluate the use of Wikipedia page views as a source of information for the identification of factors that drive tourism to an area and whether it is possible to predict tourism flows using these data. A final chapter proposes conclusions and future remarks on the use of big data in Official Statistics. Two of the studies (the first on mobility patterns and the one on Wikipedia) have been or are being published, in a shorter and revised version. The three experiments show some potential in the use of big data in Official Statistics. The study needs more in-depth analysis, many more experiments and considerations will be necessary before we can achieve some definitive and convincing approaches.File | Dimensione del file | Formato | |
---|---|---|---|
TDUnibg52978.pdf
accesso aperto
Versione:
postprint - versione referata/accettata senza referaggio
Licenza:
Licenza default Aisberg
Dimensione del file
6.78 MB
Formato
Adobe PDF
|
6.78 MB | Adobe PDF | Visualizza/Apri |
Files_SIGNORELLI.zip
Solo gestori di archivio
Versione:
non applicabile
Licenza:
Licenza default Aisberg
Dimensione del file
668.06 kB
Formato
zip
|
668.06 kB | zip | Visualizza/Apri |
Pubblicazioni consigliate
Aisberg ©2008 Servizi bibliotecari, Università degli studi di Bergamo | Terms of use/Condizioni di utilizzo