For those of you who’ve been listening to the show for a while, it is fairly obvious that there is, quite literally, a ton of data out there related to development initiatives and humanitarian assistance. If you had the time, money and desire, you could find data about almost any aspect assistance: things like baseline data about a population, damage assessments, geospatial data, demographics of the people affected by a crisis, or things like which organizations, governments and companies are on the ground helping. The problem is, in the humanitarian sector, organizations don’t have the time, money and people power to hunt down this data. And, even more of a problem is the fact that the data is locked in spreadsheets on individual laptops, only captured in written notes or, unfortunately, kept hidden as a potential competitive advantage.
Sarah Telford, my guest for the 129th episode of the Terms of Reference Podcast, is on a mission to change all of this. She is the Chief of Data Services at the United Nations Office for the Coordination of Humanitarian Affairs (OCHA), and oversees the continuing development of a global open data platform called the Humanitarian Data Exchange. The goal of HDX is to make humanitarian data easy to find and use for analysis, and, as of July 2014, has been accessed by users in over 200 countries and territories.
IN TOR 129 YOU’LL LEARN ABOUT
- The extent of the effort that must take place to aggregate humanitarian data, from large institutional sources, research teams and ground activity.
- The many steps to make data useful: gathering and collecting, cleaning, converting, validating, unifying…
- The fully open, no strings attached approach of HDX to share their stock of information, and the steps that must be taken to guarantee its financial viability.
- The role HDX played in the 2014 West Africa Ebola epidemic and the role the epidemic played on HDX going forward.
- Details on the selection and validation of data, and the promotion of data keeping standards.
- The role community building plays in guaranteeing quality, relevance and usefulness of data.
OUR CONVERSATION FEATURES THE FOLLOWING
- Humanitarian Innovation Fund (HIF)
- HIF’s Humanitarian Data Exchange (HDX)
- World Bank’s GeoNode
- United Nations
- HIF Elhra’s Journey to Scale
- HBO’s Westworld
- Techonomy Conference
- Data collection and aggregation
- Data cleaning and validation
- Data visualization
- Machine-readable data
- Artificial Intelligence
- Internet of Things
- Vulnerability Assessment Mapping
- User driven design
- Data spaces
- Community building
- Data file formats: PDF, spreadsheets
- Data anonymization
- 2014 West Africa Ebola crisis
- Cash as better way of giving
- Wetchester, New York
- Nairobi, Kenya
- The Hague, The Netherlands
EPISODE CRIB NOTES
The problem with data
Scattered in spreadsheets everywhere
Chief of Data Wizardry at OCHA
“There are a lot of hierarchies on UN”
“Titles don’t fully reflect the job”
HDX collects crisis data from several organizations
Launches in the German summer of 2014
HDX plea is to make data available and useful
Humanitarian Innovation Fund started the support, others have been joined
Data is distributed, as it is the sector. Many organization scattered
“There is no command and control”
OCHA is a lighthouse more than a panopticon
Data is used for one-time issues, generating “data mass graves”
While designing for crisis situation has informed the design, the origin of HDX is academic
World Bank is working with GeoNodes that collects spatial and other data
Interesting, but it is not maintained. Most local governments don’t prioritize this
To make something sustainable is to develop the architecture of maintenance
Metadata is important!
WB understood this, but thinking about data right at a time of crisis tends chaotic easily: gathering, unifying, wrangling and dewrangling…
“It’s all too common. We can do better. It should be easier”.
Basic multi-purpose streams (geography, population) are difficult also
And going deep into the community, there was frustration aplenty
Diplomatic exchange of data gifts among aid organizations
People can visit HDX and access data, no sign-up needed
To contribute with data, the user must contact on behalf of an organization
“Individuals don’t collect data” (…)
Organizations sizes vary from large players to university research teams
Submitting data from an organization is the first quality filter
To their surprise, organizations are not usually tidy on their data. Submitters are often ‘data activists’ from within, who take charge of making it usable in addition to their contractual duties
Some steps are taken to validate, clear and anonymize the data HDX receives
True stories about how HDX made a difference
“We have an idea of what gets viewed and downloaded, which sets are more popular, but we do not follow on how the data is used”
An item with high regard are the HDX visualizations
HDX played a key role in the 2014 Ebola crisis. They had records of infections and casualties. HDX made sure it was machine-readable data
The Ebola crisis data is by far the most popular set, and researchers still downloaded today to study the epidemics, response, etc
“Ebola put us on the map”. Since then other HDX initiatives have followed on this experience
It’s the simple things
Stephen: The big impact of sharing data in XLS instead of PDF
A DataLab in Nairobi performs data collection duties from 40 agencies. They asked HDX to help. First discovery: Data was stored in PDF
Furthermore, it was not standardized, hence it did not lend itself for comparison
“Big Data is not our focus, but smaller spreadsheets with key crisis response information”
Which does not preclude algorithmic efforts to link datasets and visualize correlations, even from different organizations
Data cleaning and validation are perhaps the most critical problems for HDX today
5-year qualitative forecast
Data spaces, starting in The Hague. An idea risen from a conference
Connecting all the levels, from head to field including all decision-making levels
Realizing not all humanitarian people is data people is important
May the new generations be more data literate. For the time being though
“The best way to guarantee quality and comparability of data across organizations and context, is community building”
Get people engaged around the story of data
Interaction and collaboration efforts will allow “to tackle problems we were incapable to before”
Do or do not with a solution in mind, there is no try
“We understood the problem” before delving into it
And there was investment in researchers and design thinking
Designers went to the ground in Africa and Colombia
“User driven” was an assumption that validated itself
Data pipelines must be joined with organization model around a product
Users were interviewed about HDX personality: mature? playful? lumberjack?
This has help to overcome the hurdle of popularity
Small armies of trainers on HDX go everywhere to instruct on its use
User research will still be pushed as HDX missionary activity.
Data cleaning too (taking care of the data basement makes the whole house work)
It’s all about creating the process (technical, logistical, organizational) that best informs decisions through data
Traditional and innovative HDX funding
“We don’t ever want to charge people for our service. I have never seen it work in this field”
Value-added products can be set up, like workshops
Some solutions by request. Or funders have a voice on how products should be done
Where Sarah gets her data
“Reading about everything”
Vulnerabiliy Assessment Mapping
HIF Elhra’s Journey to Scale
“There’s so much”
Westworld. “To an extent, HDF is an intelligence, performs automated tasks on our behalf”
Techonomy conference. This year: IoT
“The biggest disruption is cash” contributions, over in specie. With digital, we can track what people does with the money
Please share, participate and leave feedback below!
If you have any feedback you’d like to share for me or Sarah, please leave your thoughts in the comment section below! I read all of them and will definitely take part in the conversation.
If you have any questions you’d like to ask me directly, head on over to the Ask Stephen section. Don’t be shy! Every question is important and I answer every single one.
And, if you truly enjoyed this episode and want to make sure others know about it, please share it now:
Also, ratings and reviews on iTunes are very helpful. Please take a moment to leave an honest review for The TOR Podcast!