Using traditional data with new data sources such as big data can improve the quality of data for planning and policymaking.
This was emphasized in a recent public webinar conducted by the Philippine Institute for Development Studies (PIDS) that featured the study “Addressing Data Gaps with Innovative Data Sources”.
Authored by PIDS Senior Research Fellow Jose Ramon Albert, Supervising Research Specialist Jana Flor Vizmanos, Research Analyst Mika Muñoz, and research consultants Arlan Brucal, Riza Teresita Halili, Angelo Jose Lumba, and Gaile Anne Patanñe, the study explored how new data sources can be used to address data gaps in analyzing development issues and programs.
According to Vizmanos, their study examined several data sources, such as PIDS’ website, Twitter, and news websites, to extract “meaningful insights” about users’ behavior and preferences using quantitative tools such as market basket analysis, text mining, and sentiment analysis.
To get data from Twitter and news sites, they extracted only publicly available data about violence against women (VAW) and tourism via web scraping using Python, a programming language.
Analyzing web-scraped data on VAW revealed, among others, that website news “lean on the negative side across all news sites”, especially in the last two years.
Meanwhile, text mining on tourism data showed that the words “Boracay”, “Cebu”, and “beach” belong to the top topic search keywords online, with joy as the most prominent emotion based on sentiment analysis.
Vizmanos pointed out that the appeal of using big data remains because costs are “generally little” for “timely or near real-time” data despite the limitations and risks of big data.
“New data sources can complement traditional data sources used in official statistics to monitor socioeconomic conditions and various development outcomes… PIDS can harness innovative data sources to provide decisionmakers with near real-time information on policy issues,” Vizmanos said.
Co-author and United Nations Development Programme Data Analyst Angelo Jose Lumba added that while big data allow specific insights, it must not be used as primary data. “All data sources have their own strengths and limitations,” he said.
“It is important to recognize that new data sources complement but cannot replace traditional data sources, such as surveys and censuses, that undergo regular processes of data curation to maintain data quality in terms of relevance, accuracy, timeliness, accessibility, interpretability, and coherence,” the authors said in the study.
For his part, Lumba presented a study analyzing traffic congestion in Metro Manila using a dataset from Waze, a popular navigation mobile app. It revealed that Friday, Saturday, and Monday were the most congested days in total jam length from April 2019 to April 2022.
Peak traffic hours were between 6 am and 9 am and at 6 pm, which coincided with the morning and evening rush hours. Saturday recorded the highest average jam lengths, while Sunday had high peaks.
Highlighting the possible impact of using traffic data for policy development, he explained that the Waze dataset could be used to assess the viability and effectiveness of traffic-related innovations such as hybrid work, modified number coding schemes, carpooling, and probing into people’s behavior toward commuting.
“Merging traffic congestion data with datasets on health, environment, weather and climate, and the economy is needed for more nuanced analysis, sensible predictions, and impact measurement,” Lumba said.
Moving forward, the authors recommended continuously examining data and capacity building for data analytics, integrating new data sources with traditional data sources, and mitigating risks to protect data privacy and human rights.
“There needs to be a balance between protecting data privacy and harnessing the use of new data sources for safeguarding civil rights, ensuring fairness, and preventing discrimination,” the authors said in the study. (PIDS)