Accepted for oral presentation at the Alternate Track on
Journalism, Misinformation, and Fact-checking at The Web Conference 2018.
This post aims to support our work on selection bias in the context
of news coverage. The full paper can be found here
and the code is available on Github.
To cite use the following
We present the motivation for the project, and an interactive data visualisation of the
learned relationships between news sources.
News entities have to select and filter the coverage they broadcast through their respective channels, since the set of world events is too large to be treated exhaustively. The subjective nature of this filtering induces biases due to, among other things, resource constraints, editorial guidelines, ideological affinities, or even the fragmented nature of the information at a journalist's disposal. The magnitude and direction of this bias are, however, widely unknown. The absence of ground truth, the sheer size of the event space, or the lack of an exhaustive set of absolute features to measure makes it difficult to observe the bias directly, to characterize the leaning's nature and to factor it out to ensure a neutral coverage of the news.
In this work, we introduce a methodology to capture the latent structure of media’s decision process at a large scale. Our contribution is multi-fold. First, we show media coverage to be predictable using personalization techniques, and evaluate our approach on a large set of events collected from the GDELT database. We then show that a personalized and parametrized approach not only exhibits higher accuracy in coverage prediction, but also provides an interpretable representation of the selection bias. Last, we propose a method able to select a set of sources by leveraging the latent representation. These selected sources provide a more diverse and egalitarian coverage, all while retaining the most actively covered events.
This figure presents a visualisation of learned low-dimensional representation of the sources' selections in terms of coverage. We then apply an unsupervised clustering algorithm to cluster the group by similarity in this space.