StoryWeb: turn stories into research data

StoryWeb will extract structured actor networks from journalistic texts and analyze them. Essential relationships between the people and organizations mentioned usable as a knowledge graph. In this way, complex stories turn into structured datasets for future research.

Why do this?

Investigative journalism is often about networks of actors: Which companies, institutions and politicians are connected to each other, which mafia structures operate across national borders or which cliques and family structures control the allocation of posts. The article as a form of presentation is a hurdle: In order to remain readable, stories often omit essential background and references to other texts and stories.

We want to improve knowledge management in journalism. While the general public is an important consumer of media reporting, there are also many groups that consume news for professional reasons, e.g. business analysts, researchers and other journalists. They read the texts primarily to understand connections that have already been uncovered.

StoryWeb will systematically structure research knowledge and find connections between different stories and the work of different journalists. Important stories and relationships associared with a character will be summarized. This can be used both to an initial overview of a set of stories, but also for in-depth analysis of individuals and organizations. Information from relevant, structured data sources like OpenSanctions will be linked up to provide additional details for specialists.

How will this work?

In a first step, we want to build partnerships with a small selection of relevant, investigative media whose reporting is most suitable for extracting story graphs.

We'll download articles from such media sites, extract article text and metadata, and annotate relevant person and company names through various named entity extractors.

The entities mentioned should be disambiguated and merged between stories. Through simple analyses, the text should also provide clues as to the nature of the relationships among the identified actors. These relationships are finally classified into a category system through manual feedback. Automated classification can be developed as we collect a corpus of training data.

As part of the project, a small number of example story graphs will be built, each focussed on an economic or political scandal in recent history.

We also want to explore uploading some of the resulting relationship information to Wikidata.

Who is this for?

The project generates research material that can be integrated into the reasearch workflows of various user groups:

Journalists who receive a systematic picture of the work of their colleagues and will more easily find connections between previously separate stories.

Businesses also want to understand the activities of a particular company or person in depth, and keep an eye on whether there is negative media coverage regarding one of their customers.

References

Design notes
Source code
Structured Journalism (Gina Chua)
Storyline Ontology (BBC News Lab/Guardian)
Bad Will Hunting (Journalism.ai)
Hyphe (Sciences Po Media Lab)
trafilatura (BBAW)
Awesome Relation Extraction

Contact

Feel free to reach out to us at friedrich@opensanctions.org.

Funding

This project receives financial support from the German Federal Ministry for Education and Research (Bundesministerium für Bildung und Forschung, BMBF) under the grant identifier 01IS22S42. The full responsibility for the content of this publication remains with its authors.