Okay, and Chris is going to talk about data journalism. He'll have some case studies also concerning China with Uyghur camps, which is mainly through aerial data collection as well. And the title is Devolution of Data, and over to you, Chris. Hi, thanks so much for coming and also a big thanks to AMRO like for organizing this space I've been here in 2016 and actually this work that I'm going to present today the person one of the people that I did it with I met here at AMRO so just to add this little tidbit about like the usefulness of these meeting spaces that we have. So my name is Christoph Buschek and I'm traveling between Graz and Berlin. I used to live in Berlin for 10 years but now I spend a lot of time in Graz again which is the place where I'm from and I work as an independent programmer, as a data journalist, as a digital security trainer, as a researcher. It's a lot of different hats that I'm wearing, and the reason why this is, is because I build digital tools, and they're just really useful in a lot of different areas, and people ask me to do this, and therefore I have knowledge of a lot of things, a little bit, but nothing really specific, so I'm a generalist. But in this work, I collaborate with mainly human rights defenders. This is kind of like the area where I come from. And at some point, I also recognized that actually the type of work that I was doing was also very, very useful to journalists. So I started collaborating with journalists. And I was working now in academia as well. And sometimes, just sometimes, I work with artists. Today, I want to talk about these methods and tools for data-driven investigation. Data-driven investigation is a new field that is establishing itself at the moment. And I want to give one example about a project that we did where we tried to expose the scale of the incarcerations in Xinjiang in China. And this is a piece of work that we were awarded with the Pulitzer Prize last year, and it was a pretty nice piece of work, in my opinion. So when we talk about data, you hear data a lot. Also the last few days, I heard the word data, data, data, data. So we talk about data a lot, and it's very relevant to our everyday life by now. And in journalism, we use data already at least since the 70s, and back then it was called computer-assisted reporting. And the social sciences and political sciences used data even longer, like to the 60s, 50s. But basically what the data was used for, it was just to run statistics. You had a data set where the information was already available, and then you just basically you looked at the data differently. You just like create summaries of the data. You try to make like means, medians, like simple statistical operations. But what I want to talk about is something different and that's called data driven. So that's a new quality that I think is like new and establishes itself in the last few years and it's like this idea that like we can use data to discover something that we did not know before so in statistics you actually already have all the data compiled but with data-driven investigation we develop processes around data that can tell us something that we did not know before and sometimes also used to terms computational methods because it's like the overlaps are really weird. I choose to call it data driven as an umbrella term because it's just easier to pronounce. So this field, data-driven investigation, it's not only in use like in journalism but also in human rights documentation and research and it is a field that establishes itself now. So if you would ask me now, find a definition for data-driven investigation, I would not be able to tell you. But I can tell you some projects where then I would say, yeah, that's data-driven. Or the work that Giovanna does, for example, it's also very data-driven, where you try to establish new truths based on data. But what I can say definitely is that, like, in this field, data-driven investigation, in order to have it, multiple things have to come together. Like, I, as a programmer alone, I cannot do data-driven investigation. You need, like, technology, you need design, you need storytelling, and you need the investigation part. And it's only if those four things, or different fields come together and collaborate and work together then you can actually establish something that I would call data-driven investigation so this collaborative interdisciplinary approach it's a very essential aspect of this generally like it's not always true but it's often enough true that I can like break down a data-driven investigation with these four different elements and you can think of it as like as a process as a pipeline steps that we go through but it's not always like this like sometimes you approach it very different but very often the first thing we have to do is like we have to preserve data we have to get data from somewhere then the second thing is we have to explore the data to understand what we're actually looking at, then we have to verify the data which in my opinion is one of the most essential aspects of data-driven investigation, and then we also want to narrate the data, we want to tell the story like why we are doing this and what's behind it. So I want to talk about this project, Built to Last, which is a series of articles that were released last year in August, I think, and it's a collaboration with Megha Rajagopalan, a journalist from Buzzfeed News, and Alison Killing, an architect that lives in Rotterdam. And the three of us together, we were trying to figure out like how big, what is the scale of this like regime of prisons in Xinjiang in China. So we knew that like, so just like to sum it up maybe before I come to the process itself, like so the output of this work is like a series of articles. In total, there's five or six different articles published that came out of this work. And the nice thing, what I like about it, is that it combines this personal experience of people actually being imprisoned in these camps and trying to combine it with systemic proof of what is the scale, how wide is this regime, what is entailed, how many people are in prison. As a result, with our work, we could identify 428 locations in the province of Xinjiang that we would say are camps or are similar to camps. We measured the ground floor of the structures that we found, and based on this measuring, we can determine that this regime can hold up to a million people. And also because we could trace these camps historically, we could also then start to identify a certain policy shift that took place in 2016, 2017. So just a little bit background information about Xinjiang. So I'm really like no expert on Chinese politics or Chinese history, also not about Xinjiang, but it's like it's a very vast territory. So it's like 20 times the size of Austria, which is like really, really huge actually if you think of it. And it has like this population of the Uighurur people, which are around 50%, a little bit less, and they're also like a Muslim minority within China. In the early 2010s, unrest started, like violence flared up a little bit. It was not like massive violence, but definitely violence took place, targeting Han Chinese living in Xinjiang. And the reaction of this by the Chinese state was to establish a regime of like barometric surveillance and this imprisonment of people. So the goal was probably to pacify the region. I'm sure there's also economic reasons. As I said, like I'm not an expert, so I don't want to like dwell too much about it into it but so when we started this work we just said like okay we know there are camps there like in the West we speculated and we knew of around maybe 20 of those camps and the speculation was like really going wild like the speculation was something like between a hundred and a thousand camps, we just did not know. It's not possible for people in the West, journalists, to travel there and actually inspect the region, it's not allowed by the state, so we had to find a way where we could do this research without being physically there and like Megha who participated in this research, she used to live in China, and she had to leave the country because she was publishing an article about one of the camps that she went to visit in mid-2010s. So the question was really like, okay, if we don't know anything, how can you start to know something? This is like the fundamental thing, and I think this is also like one aspect that equality, the data-driven investigation, gives us us, is like that we can start with no knowledge and we can derive to a place where we have some knowledge. And so the way that we took is like an indirect one. So we discovered that like on Baidu maps, which is like the Chinese pond on to Google maps, so it's very interesting because like we in the West, we always think there's one internet. But in reality, there's, like, multiple different internets. And China has its very, very own version of an internet. And what they also have is, like, they pretty much have a service, like, for every Western service, there's a Chinese pendant to it. And so Bidomaps is like the Chinese pendant to Google Maps. And it's like a mapping service. It works, it looks pretty similar to Google Maps. And China, also the Internet in China, has like a very very strong like policy of censorship. So we discovered on Bidomaps that like Bidomaps is actually censoring some of the locations when we looked at a certain zoom level there. And you can see the censorship by these white squares, just drawn over it. The interesting thing is, but this is just a side note, that the way that this censorship implements, it's for me very, very interesting, maybe not for everyone, but actually it does load the original image and then draws the censorship over the actual image. So it's a very specific way, I think it's also like an organizational artifact where like you have Bidomaps delivering the ties and then like some state organization delivering the censorship and then they have to merge it together in the browser. But so we discovered these censorship ties on Bidomaps and then we thought like, wow, maybe when we look at the censorship of Baidu maps, maybe this gives us an indicator so that we know where to look first. Because I want to recall, it's a very vast territory. If you would manually just look on satellite imagery throughout Xinjiang, it would be a team of 50 people working on it for years and years and years. So we have to reduce the problem. We have to make it smaller. We have to make it smaller. We have to make it digestible for us that we can deal with it. And so we thought, okay, maybe this is an indicator. If Bidomaps sensors a place, it's an indicator for us to look there first. So we did this, and I wrote a program to do this that systematically scraped BIDO maps or went through BIDO maps and tried to discover these censorship tiles. So you can see how this looks roughly on BIDO maps. So you have the actual satellite imagery and then the squares is what BIDO maps renders over it as a censorship. So with Bidomaps, we ended up with three types of information. On the one hand, we had the satellite tiles, and a tile is just like an image that is 256 times 256 pixels. We have watermarked images that are just like no information, and then we have censorship images, which are really like places that have been actively censored by Bidomaps. So as I said like I wrote a program to do to automate this process to scrape the data from Bidomaps and this was like a program that just pretended to be human in front of a computer and you can really imagine it like like we would sit in front of a monitor and with a mouse just like slowly drag the map from right to left and from left to right. And because it's also like I had to try to imitate to behave like a human so I also introduced some random behavior. So I scrolled back again, I waited, I scrolled to the front, I waited one second, then I scrolled to the front one time, I waited one minute, then I scrolled back, and it just went on and on like this. And this program did this automatically. And this was a program that then ran on 25 different machines for one and a half months to collect all this data. So it's not so much data, but it's just like this slowing down the process, trying to imitate a human, just really slows down the whole scraping part. And the result of the scrapes were that we had over 50 million map tiles collected. So this was the province of Xinjiang. And five millions of those we could detect were censored. So that's still a lot of data. That's a lot of places to look at. So we try to reduce again like one more time by saying that like if you run any kind of like a prison camp you need a certain level of logistics. You have to bring people there. You have to bring supplies. The prison guards have to like move back and forth. So you need some certain type of infrastructure. So again we took this five million like sensor tile and we just mapped it with like coordinates of seeing like where our roads where our urban spaces where train tracks and if you can map it like within the vicinity then we said like okay that's like something that we want to look at so we could reduce these five million sensor tiles to around like eight hundred thousand which is still a lot. This is really a lot of data, but we said, okay, we can start, and then we see how far we get. So this is like where we started, where we said like, okay, we know nothing, and then we went to a place where we had like an indicator of where to start. So this is like kind of like a step that we took and that was required for us to then actually do the actual work, which is like looking at satellite imagery and actually looking at what's hidden there behind these censored ties. So this is where this act of verification comes in. And for me, in data-driven investigation, human rights work, in data journalism, verification is one of the most essential aspects. So for me, the problem of getting data, it's like, like to me it's not a problem. No, it's to other ones, but like I think to me like that's a done deal somehow. Like if it's available somewhere, we can get it. But the verification is like the one part that like really really takes time because it's like inherently a human process. So when we verify, it like consists of multiple things. So we look at a piece of data, we have to verify, is it true? Is it what we think we see on this piece of data? And then it's also a process of annotating, like adding to the data to increase the density of the information or to add more information that we're interested in whatever is required for the specific project and so I always call it like verification is like a due process to data before verification data is nothing data is just like a lot a lot of storage and like we don't know what's on this only through a process of verification that we can start to make a statement of fact or statement of truth about data I also think that verification is creative, so we need people for it. We cannot have machines do it for us. There is no automation in the verification. We can use machines only to help us as an aid, but we cannot replace people with it. And as a programmer, also I think verification very often is just a challenge of an interface. Like we have a lot of data in this case, like let's say a million tiles. So how do people look at a million images and make sense of it? Like that's a problem. That's a design problem as well, right? So since I'm not a designer, I try to solve this problem through process, because that's like something I understand and like I can't design and you will also see like clearly I can't design. So this is the verification tool that I wrote. This was the second part of this job that I did where I built a tool for this project. So you can see a few things. So on the one hand you can see very central like it's actually the satellite imagery. So the whole tool was set up for the researchers to look at two monitors at the same time. On the one hand the tool in full screen, on the other monitor like Google Earth as a satellite provider. And here we looked at Google Maps, OpenStreetMap and Wikimap as just like multiple ways to look at the same location and be able to switch around. And then everything around you just see like the metadata that we try to annotate around this like where we can make a decision about like what is it that we are looking at. So here on the side of the column we can just see like the steps that like a piece of data has to go through in order for us like to make sense. So you see this is a process approach to like verifying data. It's just like we start with like, okay, we're going to look at it. And then like slowly a piece of data, we have to get confident about it. And slowly through a due process, we like move a piece of data further in the process till we reach it. Either like we reach a point where it's like, okay, now we know what's there. We can make a statement about it. Or we decide that this piece of data is not useful to us and we throw it into the trash. So this is the process part of data verification. And at the bottom you can see the annotation part of this data verification, where we try to identify and annotate the data and make more sense of what it is that we are looking at. So you can see, we were looking at, do we see walls, do we see guard towers? Do we see barbed wire? Like, whatever, like, we thought would be relevant for, like, identifying, like, structures of coercion. And this metadata, this rubric, it's something that we built up, like, by looking at the data, we learned about, like, the problem. And while we learned about it, we enhanced this rubric and added new categories to be able to make better statements about the data that we were looking at. So this is roughly summarized, the process that we took to get to the results. So we started from knowing nothing, then we went to a place where we knew something but not the thing we were looking for, we just used it as a stepping stone, and then we ended up with a whole bunch of locations and satellite imagery where we were looking for we just use it as a stepping stone and then we ended up with like a whole bunch of like locations and satellite imagery where we were able to make a judgment call about where we could say like we think this is that and this is not this so as I said like the results that we came up with in this research and this research took over one and a half years and I really have to credit like Alison for doing really the satellite imagery, like the analysis. So we identified, as I said, like 428 locations. We could measure the floor space without overcrowding. It could hold up to a million people. We could also identify that in 2016, 2017, a change of policy took place. So before that, these camps were like more in temporary buildings, old factories, old schools. Around 2016-2017 they started to build up like camps, they were like dedicated camps. They built new structures all over Xinjiang just to house people like this. From the data that we produced we could make like roughly three bags of data like and that's also something that like when working with data is very common like you have to work with probability and not like absolute fact so the first category is like the stuff like the locations that we were sure about and we were sure because of our method and because we had the second way of corroborating what we were looking there. And we were sure because of our method and because we had the second way of corroborating what we were looking there. Somebody gave us a witness testimony, or we could find a photograph of the place, or something else that, independently of our method, could identify the location. The second category is locations that we, with our method, identified and said, this is a camp, but we have no other way to tell. And the third one was just like we just don't know we it could be maybe not we don't know like um so but we also thought like okay it's also relevant because maybe like in the future someone else would have an interest in this as well and so one interesting aspect also of like data-driven investigation is that like while this output was like a series of articles and that's like the main way for people to consume the work another output was just like a data set again and this data set became the basis for somebody else's work again and this is also some really important aspect of this type of investigation is that like your output becomes the input of somebody else's work again and this is also something that we always have to consider and think and we have to make it easy for other people to build upon like the work that we did before. So when we look at satellite imagery, so I want to show you how this looks like. So if this works and the internet works we're gonna just look at some satellite imagery. Okay, so let me just zoom this around. Okay, so this is the satellite imagery of a camp in Devon Chang. It's a very large one. The structure above that you can see can hold up to 30,000 people. The structure below the second part on the left side can hold up to 10,000 people. We think the second one, the smaller one, is like a higher security part. But, yeah. In Google Earth, you can also look at the history of the satellite imagery. So, for example, we can jump here in 2015, and you can see that nothing existed in 2015 at this location. If you jump to the front, so this is now, we are in 2018. You can see it up there in the corner, there's the front. So this is now, we are in 2018. You can see it up there in the corner. There's the year. You can see, like, the structure, like, the construction is underway. So one way why we know that this is not yet the finished structure is because, like, normally the last thing in the construction site to be added is the asphalt in front of the construction site. So until the construction is not done, there is no asphalt because the construction vehicles will destroy it again. So this is like a camp or this is a structure that is in the process of being built. If I jump ahead again, so now I'm in March 2020, I can see the camp already being built and finished. I have one more. You can also recognize that like here for example and down here you can recognize that like the asphalt has been laid out in front of the structures. If I jump one more year again like I can see that basically like nothing changed so in 2000 what was it 20, 2020 this structure was already finished maybe it has been finished in 2019, I don't know now, I cannot tell from this. So let's take a look at, let's zoom in here into this one. So what makes this interesting? So if we look at this, we can immediately recognize a few things that are interesting. So on the one hand, it's these concrete walls that we can see going all around the place. And these concrete walls also have like guard towers that are like in the middle of the walls and in the corners of the walls. If I go back one year into 2020, the imagery changes and have like different like atmospheric conditions. And therefore, if I zoom in down here, I can actually recognize the shadow of barbed wire on both sides of the wall. So you can see it here, and you can see it out here. This is a shadow of barbed wire. In 2021, the conditions of the satellite image was different, and it's much, much harder to to recognize you cannot see it necessarily. Other things that we can recognize here is this here for example which is probably like very likely like some kind of like guard building and you can also recognize that this building has like a pathway onto the walls so this is like a way for the guards to enter or to man the walls but not coming is like a way for the guards to enter or to man the walls but not coming from the inside of the camp but from the outside of the camp. You can recognize here additionally fencing around the building. So this is some kind of like guarded yard. So possibly this is like where inmates are living and then like when they get to go out like this is like the yard where they can like be outside. If I zoom out a little bit more I can also recognize here like this construction at the entrance. So this is some kind of like gateway into the complex. So that's also very typical that you can see this like one way to enter the place. You can recognize these three buildings that are like U-shaped, which are most likely like prison or like cell buildings. You have some other facilities like this one or this one. We don't know what they are. It could be a visiting center. It could be like administrative building. It could be like a medical building. We don't know what they are. It could be a visiting center, it could be an administrative building, it could be a medical building. We don't know. Another thing you can notice, of course, is a parking lot. Prisons always have a parking lot, so that's also an indicator. This is how basically the work looked like that Allison did, by looking at these locations, looking at the satellite imagery. We also had other sources of satellite imagery, so Google Earth is fantastic, but sometimes it's lacking a certain year or image, and if you have access to a satellite imagery company, that's really, really great, and you should really keep those contexts warm. They're very, very useful. So this is how it looks like to look at like these structures of coercion from like the top from the eagle eye view from above right the thing is like this is a very abstract view on what is actually happening there like it's very easy to look at it in a very technical way like abstracted like you you have no emotional physical connection to it and and this is a problem, of course. But luckily, there's this guy. I don't know him. But he took our list of camps that we published, and he went to like 18 facilities in eight different cities, and he drove there and took video footage of those facilities. As journalists, like from the West, we cannot enter, but he also in this video stated, like, yeah, but I live here. I can drive around, so he did. And he just drove around, he took this video footage, he could show the camps from like how they look like on the ground, and I wanna show you this now. So this is the camp that we looked at just now, this is this camp in Debanjiang, and this is how it looks like. So you can see like the way going there. It's this very rocky, sandy area. This is the view on Devanchang from this little guard building that I showed you before, with a pathway to the wall. This is the side that he's looking at. So you can recognize these U-shaped buildings that you could see on the satellite imagery. This pole is like, you can see the guard tower. This pole is like surveillance. This is the little like building with the yard, with the fence yard. You can recognize the barbed wire, the prison building. So here it's again, you can look at it in a closer, there's more screenshots from this, but as I said, here you can see the media surveillance, you can see the different elements that you could also recognize on the satellite imagery. This is a different camp, and this is the last one I'm gonna show you, but I wanna show you just also to understand actually the size of this, how it feels, how big this is the last one I'm going to show you But I want to show you just like also like to understand like actually the size of this like how it feels like how big This is this is this is a camp that can hold probably like up to 10,000 people maybe a little bit more But what is really incredible about is just like the sheer dimension of it So this strip going from like the top corner down there. It's like over kilometer and在上面的最高位置,是超過一公里所以,現在的影片開始了他在轉圈的時候這就是這條路的開始高科路用百度地图测量工具测得长度约1.1公里,从建筑规模来判断这里应该关押了非常多的人。这里应该是乌鲁木齐市地区规模最大的集中营区域。这里注意建筑物史上的标语,隐约可见劳动改造,文化改造四个字。 Thank you for watching. I'm going to use the same method for the other side. So again, these are all structures. We see now structures, how they look from the ground, like how it is if you stand in front of them. before we could only see them from above which is like a very very different like view to take like also the aesthetic changes like very very much if you actually exceed them like from like actually standing in front of them I think if I don't speak Mandarin but I think I've been told that this means like cultural reform labor reform reform. So they have, however, this kind of like science like describing what it is they're doing. And here you can just see like different details of like these structures. So this is the research that we did. And so my role in this research was like the building the methods and the tools. So, and I wanna circle back to this. I'm not sure if I'm over time, but I just want to talk a little bit about what it means to build technology for this kind of research, for this kind of work. And of course, I know it's like a huge cliche to put Marshall McLuhan on the slide and especially this quote, but it actually does really fit very, very well. Because what I want to stress out is that there's a relationship between the tools that we are using and the type of work that we want to do. And this relationship goes two ways. Often we think, okay, we just use our tools and then we can manipulate the world around us. But it is our tools that decide what it is we can manipulate and what it is that we do and how we do it as well so very classical examples about tools is just like if we all use Google Docs we create a structure where communication is always central there's always a central point and we just like circle around the central point and communicate to the to the middle like this is like a structure we are changing the way that we communicate in that we talk to each other by using these tools. And I think this is a very, very important thing to consider. So there's like a two-way relationship between tools, humans, and technology. And also like what is also very, very important to stress out, and I hear it so often, that people claim this, but it's absolutely not true. Like technology is not neutral. Technology is not neutral. Technology is always something, or tools, methods, and technologies are always something that comes out of a context, out of certain power relationships, that is perceived with a target, and is used for something again. And these things are not something that you can just negate and you think, oh, a tool is neutral. It's just like I can use it for good or evil. That's not true. A tool also always encodes a certain system of values. And also as a programmer, of course, it's a pet peeve of mine. It's this focus on the Silicon Valley as the place where technology is happening. And while, of course, it is an incredibly interesting place, but it's also very ideological place and not everything that comes out of Silicon Valley and I would even argue that probably most of the things that come out of Silicon Valley are actually useful to us or make sense or actually encode the values that we want to like live and like use in our work. I think that the tools that we use and the tools that we build, they have to reflect our values. And I think it's very important to also define upfront like what those values are. Before I start to build a software, I have to be clear about to myself what are the values that I want to encode in this. A program is also always like an encoding of values, of processes, of thinking, of philosophies. And also like speaking out of this community of data-driven investigation, journalism, human rights work, I think that we should build our own tools. We should not rely on the tools that others built for us and that are meant for something else. Especially because out of these reasons of the ownership of the tools, the ownership of data, the values that are reflected in these tools and i think it's very very important that we as communities in different communities not just mine but yours as well that we really build our own tools and like distribute those tools within our communities there's a second insight that is also like a quite a cliche in computer science but it's this idea that like that we are always doomed to build systems that reflect our social structures. The way that we communicate within our social structures, that's how our systems will look like. And the idea is that basically we interface different technological systems along the communication lines of the different departments, groups, hierarchies, power relationships that we have within our social structures. And our digital systems will always reflect those social structures as well. And I think this is true, and also when I think of my communities, this human rights communities, data journalism, I always think of graphs of networks that come together and that have multiple connections and overlay layer and like that are very interpersonal like and and I think like our tool should also reflect this kind of like structure we should think about this that these are the communication structures that we want to support it's is like our top collaborations creating like being able to create like new relations between people organizations projects and and I think like when we think of tools, of digital tools, we always have to think of tools as nodes in this wider graph. And they are not just tools in this graph. There are also people in this graph. There's organizations in these graphs. There's institutions in these graphs. And they all communicate and relate to each other. And I think we really have to think of those tools also as part of this graph. And the last thing, and then I'm done for today, it's like that what I noticed a lot is that, like, when we think about software and tools, everybody immediately jumps to, like, features. They always say, like, oh, if I can use this tool, I can do that. But nobody ever thinks about, like, what it is that we lose when we choose a tool. Like, every tool is always, like, a choice for something and a trade-off against something and when we think about tools we should also always consider like what are we losing when we choose this tool and is this like a trade-off that we are willing to to do or is it like not acceptable but we have to be very very conscious about it we should be like clear about it up front that like we mostly lose more than we win by choosing a tool or building a tool. So I want to thank you a lot. If you want to get in contact with me, you can write me on this email address. I don't have Twitter, no Instagram, nothing, or you can just talk to me, so that's fine as well. So thanks a lot. Thank you.