J Research Mid term
Lexis Nexus question-What is a boulin operator? What regarding your project did Lexus nexus reveille?
(shortcut to limiting a search)
Why Is Data Journalism Important?
-Filtering the Flow of Data -New Approaches to Storytelling -Like Photo Journalism with a Laptop('Data journalism' only differs from 'words journalism' in that we use a different kit. We all sniff out, report, and relate stories for a living. It's like 'photo journalism'; just swap the camera for a laptop.) -Data Journalism is the Future -Number-Crunching Meets Word-Smithing -Updating Your Skills Set -A Remedy for Information Asymmetry -An Answer to Data-driven PR -Providing Independent Interpretations of Official Information -Dealing with the Data Deluge -Our Lives are Data -A Way to Save Time -An Essential Part of the Journalists' Toolkit -Adapting to Changes in Our Information Environment -A Way to See Things You Might Not Otherwise See -A Way To Tell Richer Stories
Tips for Working with Numbers in the News
-The best tip for handling data is to enjoy yourself. Data can appear forbidding. But allow it to intimidate you and you'll get nowhere. Treat it as something to play with and explore and it will often yield secrets and stories with surprising ease. So handle it simply as you'd handle other evidence, without fear or favour. In particular, think of this as an exercise in imagination. Be creative by thinking of the alternative stories that might be consistent with the data and explain it better, then test them against more evidence. -Don't confuse skepticism about data with cynicism. Skepticism is good; cynicism has simply thrown up its hands and quit. If you believe in data journalism, and you probably do or you wouldn't be reading this book, then you must believe that data has something far better to offer than the lies and damned lies of caricature or the killer facts of swivel-eyed headlines. Data often give us profound knowledge, if used carefully. We need to be neither cynical nor naive, but alert. -If I tell you that drinking has gone up during the recession, you might tell me it's because everyone is depressed. If I tell you that drinking is down, you might tell me it's because everyone is broke. In other words, what the data says makes no difference to the interpretation that you are determined to put on it, namely that things are terrible one way or the other. - -Uncertainty is ok. We associate numbers with authority and certainty. Often as not, the answer is that there is no answer, or the answer may be the best we have but still wouldn't hit a barn door for accuracy. The investigation is a story. The story of how you tried to find out can make great journalism, as you go from one piece of evidence to another — and this applies in spades to the evidence from data, where one number will seldom do. -The best questions are the old ones: is that really a big number? Where did it come from? Are you sure it counts what you think it counts?
Data Journalism in Perspective
-in August 2010 some colleagues and I organised what we believe was one of the first international 'data journalism' conferences, which took place in Amsterdam. -Speaking to experienced data journalists and journalism scholars on Twitter it seems that one of the earliest formulations of what we now recognise as data journalism was in 2006 by Adrian Holovaty, founder of EveryBlock — an information service which enables users to find out what has been happening in their area, on their block. In his short essay "A fundamental way newspaper sites need to change", he argues that journalists should publish structured, machine-readable data, alongside the traditional 'big blob of text': -'Computer-Assisted Reporting' and 'Precision Journalism' -Data journalism is about mass data literacy -
Become Data Literate in 3 Simple Steps
1. How was the data collected? Amazing GDP growth The easiest way to show off with spectacular data is to fabricate it. It sounds obvious, but data as commonly commented upon as GDP figures can very well be phony. Former British ambassador Craig Murray reports in his book, Murder in Samarkand, that growth rates in Uzbekistan are subject to intense negotiations between the local government and international bodies. In other words, it has nothing to do with the local economy. GDP is used as the number one indicator because governments need it to watch over their main source of income - VAT. When a government is not funded by VAT, or when it does not make its budget public, it has no reason to collect GDP data and will be better-off fabricating them. 2. What's in there to learn? Risk of Multiple Sclerosis doubles when working at night Surely any German in her right mind would stop working night shifts after reading this headline. But the article doesn't tell us what the risk really is in the end. Take 1,000 Germans. A single one will develop MS over his lifetime. Now, if every one of these 1,000 Germans worked night shifts, the number of MS sufferers would jump to 2. The additional risk of developing MS when working in shifts is 1 in 1,000, not 100%. Surely this information is more useful when pondering whether to take the job. On average, 1 in every 15 Europeans totally illiterate The above headline looks frightening. It is also absolutely true. Among the 500 million Europeans, 36 million probably don't know how to read. As an aside, 36 million are also under 7 (data from Eurostat). 3. How reliable is the information? The sample size problem "80% dissatisfied with the judicial system", says a survey reported in Zaragoza-based Diaro de Navarra. How can one extrapolate from 800 respondents to 46 million Spaniards? Surely this is full of hot air. When researching a large population (over a few thousands), you rarely need more than a thousand respondents to achieve a margin of error under 3%. It means that if you were to retake the survey with a totally different sample, 9 times out of 10, the answers you'll get will be within a 3% interval of the results you had the first time around. Statistics are a powerful thing, and sample sizes are rarely to blame in dodgy surveys.
The £32 Loaf of Bread
A story for Wales on Sunday about how much the Welsh Government is spending on prescriptions for gluten-free products, contained the headline figure that it was paying £32 for a loaf of bread. However, this was actually 11 loaves that cost £2.82 each. No one, not the people who answered the written answer or the press office, when it was put to them, raised the issue about quantity until the Monday after the story was published. So do not assume that the background notes for Government data will help explain what information is being presented or that the people responsible for the data will realize the data is not clear even when you tell them your mistaken assumption.
Business Models for Data Journalism
Amidst all the interest and hope regarding data-driven journalism there is one question that newsrooms are always curious about: what are the business models? - Terms like "data journalism", and the newest buzzword "data science" may sound like they describe something new, but this is not strictly true. Instead these new labels are just ways of characterizing a shift that has been gaining strength over decades. Many journalists seem to be unaware of the size of the revenue that is already generated through data collection, data analytics and visualization. This is the business of information refinement. With data tools and technologies it is increasingly possible to shed a light on highly complex issues, be this international finance, debt, demography, education and so on. The term "business intelligence" describes a variety of IT concepts aiming to provide a clear view on what is happening in commercial corporations. The big and profitable companies of our time, including McDonalds, Zara or H&M rely on constant data tracking to turn out a profit. And it works pretty well for them.
Your Right to Data
Before you make a Freedom of Information (FOI) request you should check to see if the data you are looking for is already available — or has already been requested by others. Plan Ahead to Save Time Think about submitting a formal access request whenever you set out to look for information. It's better not to wait until you have exhausted all other possibilities. You will save time by submitting a request at the beginning of your research and carrying out other investigations in parallel. Be prepared for delay: sometimes public bodies take a while to process requests, so it is better to expect this. Check the Rules About Fees Before you start submitting a request, check the rules about fees for either submitting requests or receiving information. Know Your Rights Find out what your rights are before you begin, so you know where you stand and what the public authorities are and are not obliged to do. Say That You Know Your Rights Usually the law does not require that you mention the access to information law or freedom of information act, but this is recommended because it shows you know your legal rights and is likely to encourage correct processing of the requests according to the law. Keep it Simple In all countries, it is better to start with a simple request for information and then to add more questions once you get the initial information. That way you don't run the risk of the public institution applying an extension because it is a "complex request". Keep it Focused A request for information only held by one part of a public authority will probably be answered more quickly than one which requires a search across the entire authority Think Inside the Filing Cabinet Try to find out what data is collated. For example, if you get a blank copy of the form the police fill out after traffic accidents, you can then see what information they do or do not record about car crashes. Be Specific Before you submit your request, think: is it in any way ambiguous? This is especially important if you are planning to compare data from different public authorities. Submit Multiple Requests If you are unsure where to submit your request, there is nothing to stop you submitting the request with two, three or more bodies at the same time. Submit International Requests Increasingly requests can be submitted electronically, so it doesn't matter where you live. Alternatively, if you do not live in the country where you want to submit the request, you can sometimes send the request to the embassy and they should transfer it to the competent public body. Do a Test Run If you are planning to send the same request to many public authorities start by sending an initial draft of the request to a few authorities as a pilot exercise. Anticipate the Exceptions If you think that exceptions might be applied to your request, then, when preparing your questions, separate the question about the potentially sensitive information from the other information that common sense would say should not fall under an exception. Then split your question in two and submit the two requests separately. Ask for Access to the Files If you live near where the information is held (e.g. in the capital where the documents are kept), you can also ask to inspect original documents. Keep a Record! Make your request in writing and save a copy or a record of it so that in the future you are able to demonstrate that your request was sent, in case you need to make an appeal against failure to answer. Make it Public Speed up answers by making it public that you submitted a request: If you write or broadcast a story that the request has been submitted, it can put pressure on the public institution to process and respond to the request. Involve Colleagues If your colleagues are sceptical about the value of access to information requests, one of the best ways to convince them is to write a story based on information you obtained using an access to information law. Ask for Raw Data If you want to analyze, explore or manipulate data using a computer then you should explicitly ask for data in an electronic, machine-readable format. -Asking About organizations Exempt From FOI Laws You may wish to find out about NGOs, private companies, religious organizations and/or other organizations which are not required to release documents under FOI laws.
Crowdsourcing Data at the Guardian Datablog
Crowdsourcing, according to Wikipedia, is "a distributed problem-solving and production process that involves outsourcing tasks to a network of people, also known as the crowd". Sometimes you will get a ton of files, statistics, or reports which it is impossible for one person to go through. Also you may get hold of material that is inaccessible or in a bad format and you aren't able to do much with it. This is where crowdsourcing can help. The MPs Expenses project generated lots of tip-offs. We got more stories than data. The project was remarkably successful in terms of traffic. People really liked it. If I were to give advice to aspiring data journalists who want to use crowdsourcing to collecting data, I would encourage them do this on something that people really care about, and will continue to care about when it stops making front page headlines. Also if you make something more like a game this can really help to engage people. When we did the expenses story a second time it was much more like a game with individual tasks for people to do.
Using Data Visualization to Find Insights in Data
Data by itself, consisting of bits and bytes stored in a file on a computer hard drive, is invisible. In order to be able to see and make any sense of data, we need to visualize it. In this chapter I'm going to use a broader understanding of the term visualizing, that includes even pure textual representations of data. Using visualization to Discover Insights It is unrealistic to expect that data visualization tools and techniques will unleash a barrage of ready-made stories from datasets. How To Visualize Data-Visualization provides a unique perspective on the dataset. You can visualize data in lots of different ways. Tables are very powerful when you are dealing with a relatively small number of data points. They show labels and amounts in the most structured and organized fashion and reveal their full potential when combined with the ability to sort and filter the data. Analyze and Interpret What You See Once you have visualized your data, the next step is to learn something from the picture you created. You could ask yourself: What can I see in this image? Is it what I expected? Are there any interesting patterns? What does this mean in the context of the data? Sometimes you might end up with visualization that, in spite of its beauty, might seem to tell you nothing of interest about your data. But there is almost always something that you can learn from any visualization, however trivial. Document Your Insights and Steps If you think of this process as a journey through the dataset, the documentation is your travel diary. It will tell you where you have traveled to, what you have seen there and how you made your decisions for your next steps. You can even start your documentation before taking your first look at the data. Transform Data Naturally, with the insights that you have gathered from the last visualization you might have an idea of what you want to see next. You might have found some interesting pattern in the dataset which you now want to inspect in more detail. Possible transformations are: Zooming To have look at a certain detail in the visualization Aggregation To combine many data points into a single group Filtering To (temporarily) remove data points that are not in our major focus Outlier removal To get rid of single points that are not representative for 99% of the dataset. Which Tools to Use- The question of tools is not any easy one. Every data visualization tool available is good at something. Visualization and data wrangling should be easy and cheap. If changing parameters of the visualizations takes you hours, you won't experiment that much.
Data Stories
Data journalism can sometimes give the impression that it is mainly about presentation of data — such as visualizations which quickly and powerfully convey an understanding of an aspect of the figures, or interactive searchable databases which allow individuals to look up say their own local street or hospital. 2. Proportion 'Last year local councils spent two-thirds of their stationery budget on paper clips' Or 3. Internal comparison 'Local councils spend more on paper clips than on providing meals-on-wheels for the elderly' Or 4. External comparison 'Council spending on paper clips last year was twice the nation's overseas aid budget' Or there are other ways of exploring the data in a contextual or comparative way: 5. Change over time 'Council spending on paper clips has trebled in the past four years' Or 6. 'League tables' These are often geographical or by institution, and you must make sure the basis for comparison is fair, e.g. taking into account the size of the local population. 'Borsetshire Council spends more on paper clips for each member of staff than any other local authority, at a rate four times the national average' Or you can divide the data subjects into groups: 7. Analysis by categories 'Councils run by the Purple Party spend 50% more on paper clips than those controlled by the Yellow Party' Or you can relate factors numerically 8. Association 'Councils run by politicians who have received donations from stationery companies spend more on paper clips, with spending increasing on average by £100 for each pound donated' But, of course, always remember that correlation and causation are not the same thing. So if you're investigating paper clip spending, are you also getting the following figures: Total spending to provide context? Geographical/historical/other breakdowns to provide comparative data? The additional data you need to ensure comparisons are fair, such as population size? Other data which might provide interesting analysis to compare or relate the spending to?
Who wrote How Google works?
Eric Schmidt
(Whos is Glenn Kessler?)-
Glenn Kessler (born July 6, 1959) is a veteran diplomatic correspondent who writes the popular [1] "Fact Checker" blog for The Washington Post.[2]
How the Datablog Used Crowdsourcing to Cover Olympic Ticketing
I think the crowdsourcing project that got the biggest response was a piece on the Olympic ticket ballot. Thousands of people in the UK tried to get tickets for the 2012 Olympics and there was a lot of fury that people hadn't got them. People had ordered hundreds of pounds worth and were told that they'd get nothing. But no one really knew if it was just some people complaining quite loudly while actually most people were happy. So we tried to work out a way to find out. We decided the best thing we could really do, with the absence of any good data on the topic, was to ask people. And we thought we'd have to treat it as a light thing because it wasn't a balanced sample. We created a Google form and asked very specific questions. It was actually a long form, it asked how much in value people had ordered their tickets, how much their card had been debited for, which events they went for, this kind of thing. We put it up as a small picture on the front of the site and it was shared around really rapidly. I think this is one of the key things, you can't just think 'what do I want to know for my story', you have to think 'what do people want to tell me right now'. And it's only when you tap into what people want to talk about that crowdsourcing is going to be successful. The volume of responses for this project, which is one of our first attempts at crowdsourcing, was huge. We had a thousand responses in less than an hour and seven thousands by the end of that day. In this case we thought of Google Forms. If someone fills in the form you can see the result as a row on a spreadsheet. This meant that even if it was still updating, even if results were still coming in, I could open up the spreadsheet and see all of the results straight away. I could have tried to do the work in Google but I downloaded it into Microsoft Excel and then did things like sort it from low to high and found the people who decided to write in instead of putting digits on how much they spent and fixed all of those. I decided not to exclude as little as I could. So rather than taking only valid responses, I tried to fix other ones. People had used foreign currencies so I converted them to sterling, all of which was a bit painstaking. But the whole analysis was done in a few hours, and I knocked out the obviously silly entries. A lot of people decided to fill it out pointing out they spent nothing on tickets. That's a bit facetious but fine. That was less than a hundred out of over seven thousands entries. We decided to use Google Docs because it gives complete control over the results. I didn't have to use anyone else's analytic tools. I can put it easily into a database software or into spreadsheets. In terms of advice for data journalists who want to use crowdsourcing: you have to have very specific things you want to know. Ask things that get multiple choice responses as much as possible. Try to get some basic demographics of who you are talking to so you can see if your sample might be biased. If you are asking for amounts and things like this, try in the guidance to specify that it's in digits, that they have to use a specific currency and things like that. A lot won't, but the more you hold their hand through, the better. And always, always add a comment box because a lot of people will fill out the other things but what they really want is to give you their opinion on the story. Especially on a consumer story or an outrage.
Harnessing External Expertise Through Hackthons
In March 2010, Utrecht based digital culture organzation SETUP put on an event called 'Hacking Journalism'. The event was organised to encourage greater collaboration between developers and journalists. 'We organize hackathons to make cool applications, but we can't recognise interesting stories in data. What we build has no social relevance', said the programmers. 'We recognize the importance of data journalism, but we don't have all the technical skills to build the things we want', said the journalists.
Using and Sharing Data: the Black Letter, Fine Print, and Reality
In this section we'll have a quick look at the state of the law with respect to data and databases, and what you can do to open up your data using readily available public licenses and legal tools. To state the obvious, obtaining data has never been easier. Before the widespread publishing of data on the web, even if you had identified a dataset you needed, you'd need to ask whoever had a copy to make it accessible to you, possibly involving paper and the post or a personal visit. What about downloading data with a program (sometimes called "scraping") and terms of service (ToS)? Consider the previous paragraph: your browser is just such a program. Might ToS permit access by only certain kinds of programs? If you have inordinate amounts of time and money to spend reading such documents and perhaps asking a lawyer for advice, by all means, do. Once you have some data of interest, you can query, pore over, sort, visualize, correlate and perform any other kind of analysis you like using your copy of the data. If you're familiar with how copyright restricts creative works — if the copyright holder hasn't given permission to use a work (or the work is in the public domain or your use might be covered by exceptions and limitations such as fair use) and you use — distribute, perform, etc. — the work anyway, the copyright holder could force you to stop.
Following the Money: Cross-Border Collaboration
Investigative journalists and citizens interested in uncovering organised crime and corruption that affect the lives of billions worldwide gain, with each passing day, unprecedented access to information. Huge volumes of data are made available online by governments and other organzations and it seems that much needed information is more and more in everyone's grasp. However, at the same time, corrupt officials in governments and organised crime groups are doing their best to conceal information in order to hide their misdeeds. They make efforts to keep people in the dark while conducting ugly deals that cause disruptions at all society levels and lead to conflict, famine or other types of crisis. - Think Outside Your Country In many instances it is much easier to get information from abroad than from within the country where the investigative journalist operates. Information gathered from abroad via foreign information databases or by using other countries' access to information laws might be just what is needed to put the investigative puzzle together. Make Use of the Existing Investigative Journalism Networks Investigative journalists all over the world are grouped in organzations such as The Organized Crime and Corruption Reporting Project, The African Forum for Investigative Reporting, The Arab Reporters for Investigative Journalism, The Global investigative Journalism Network. Make Use of Technology and Collaborate with Hackers Software helps investigative journalists access and process information. Various types of software assist the investigator in cutting through the noise, in digging and making sense of large volumes of data and in finding the right documents needed to break the story.
How to Hire a Hacker
Journalists are power-users of data driven tools and services. From the perspective of developers: journalists think outside the box to use data tools in contexts developers haven't always considered before (feedback is invaluable!) they also help to build context and buzz around projects and help to make them relevant. It is a symbiotic relationship. Here are a few more ideas: Post on job websites Identify and post to websites aimed at developers who work in different programming languages. For example, the Python Job Board. Contact relevant mailing lists For example, the NICAR-L and Data Driven Journalism mailing lists. Contact relevant organizations For example, if you want to clean up or scrape data from the web, you could contact an organzation such as Scraperwiki, who have a great address book of trusted and willing coders. Join relevant groups/networks Look out for initiatives such as Hacks/Hackers which bring journalists and techies together. Hacks/Hackers groups are now springing up all around the world. You could also try posting something to their jobs newsletter. Local interest communities You could try doing a quick search for an area of expertise in your area (e.g. 'javascript' + 'london'). Sites such as Meetup.com can also be a great place to start. Hackathons and competitions Whether or not there is prize money available: app and visualization competitions and development days are often fruitful ground for collaboration and making connections. Ask a geek Geeks hang around with other geeks. Word of mouth is always a good way to find good people to work with. Once you've found a hacker, how do you know if they are any good? We asked Alastair Dant, the Guardian's Lead Interactive Technologist, for his views on how to spot a good one: They code the full stack When dealing with deadlines, it's better to be a jack of all trades than a master of one. News apps require data wrangling, dynamic graphics and derring-do. They see the whole picture Holistic thinking favours narrative value over technical detail. I'd rather hear one note played with feeling than unceasing virtuosity in obscure scales. Find out how happy someone is to work alongside a designer. They tell a good story Narrative presentation requires arranging things in space and time. Find out what project they're most proud of and ask them to walk you through how it was built — this will reveal as much about their ability to communicate as their technical understanding. They talk things through Building things fast requires mixed teams working towards common goals. Each participant should respect their fellows and be willing to negotiate. Unforeseen obstacles often require rapid re-planning and collective compromise. They teach themselves Technology moves fast. It's a struggle to keep up with. Having met good developers from all sorts of backgrounds, the most common trait is a willingness to learn new stuff on demand. -How To Find Your Dream Developer(The productivity difference between a good and a great developer is not linear, it's exponential. Hiring well is extremely important. Unfortunately, hiring well is also very difficult. It's hard enough to vet candidates if you are not an experienced technical manager. Add to that the salaries that news organzations can afford to pay, and you've got quite a challenge. At Tribune, we recruit with two angles: an emotional appeal and a technical appeal. The emotional appeal is this: Journalism is essential to a functioning democracy. Work here and you can change the world. Technically, we promote how much you'll learn. Our projects are small, fast and iterative. Every project is a new set of tools, a new language, a new topic (fire safety, the pension system) that you must learn. The newsroom is a crucible. I've never managed a team that has learned so much, so fast, as our team.) As for where to look, we've had great luck finding great hackers in the open government community. The Sunlight Labs mailing list is where do-gooder nerds with shitty day jobs hang out at night. Another potential resource is Code for America. Every year, a group of fellows emerges from CfA, looking for their next big project. And as a bonus, CfA has a rigorous interview process — they've already done the vetting for you. Nowadays, programming-interested journalists are also emerging from journalism schools. They're green, but they've got tons of potential. Lastly, it's not enough to just hire developers. You need technical management. A lone-gun developer (especially fresh from journalism school, with no industry experience) is going to make many bad decisions. Even the best programmer, when left to her own devices, will choose technically interesting work over doing what's most important to your audience. Call this hire a news applications editor, a project manager, whatever. Just like writers, programmers need editors, mentorship and somebody to wrangle them towards making software on deadline.
Data Journalists Discuss Their Tools of Choice
Lisa Evans, The Guardian- we're currently using Google products quite heavily for this reason. All the datasets we tidy and release are available as a Google Spreadsheet which means people with a Google account can download the data, import it into their own account and make their own charts, sort the data and create pivot tables or they can import the data into a tool of their choice. Cynthia O'Murchu, Financial Times- My advice is to learn Excel and do some simple stories first. Start out small and work your way up to database analysis and mapping. You can do so much in Excel — it's an extremely powerful tool and most people don't even use a fraction of its functionality. Scott Klein, ProPublica-Django, which is built on top of the Python programming language, was developed by Adrian Holovaty and a team working in a newsroom - Cheryl Phillips, Seattle Times- But using a spreadsheet back when everything was in DOS enabled me to understand a complex formula for the partnership agreement for the owners of The Texas Rangers — back when George W. Bush was one of the key owners. A spreadsheet can help me flag outliers or mistakes in calculations. Gregor Aisch, Open Knowledge Foundation-'m a big fan of Python. Python is a wonderful open source programming language which is easy to read and write (e.g. you don't have to type a semi-colon after each line). More importantly, Python has a tremendous user base and therefore has plugins (called packages) for literally everything you need. Steve Doig, Walter Cronkite School of Journalism of Arizona State University-My go-to tool is Excel, which can handle the majority of CAR problems and has the advantages of being easy to learn and available to most reporters. Brian Boyer, Chicago Tribune-Our tools of choice include Python and Django. Pedro Markun, Transparência Hacker As a grassroots community without any technical bias we at Transparency Hackers use a lot of different tools and programming languages.
We asked some of our contributors for their favorite examples of data journalism and what they liked about them. Here they are.
My favourite example is the Las Vegas Sun's 2010 Do No Harm series on hospital care (see Figure 5). The Sun analyzed more than 2.9 million hospital billing records, which revealed more than 3600 preventable injuries, infections and surgical mistakes. They obtained data through a public records request and identified more than 300 cases in which patients died because of mistakes that could have been prevented. It contains different elements, including: an interactive graphic which allows the reader to see by hospital, where surgical injuries happened more often than would be expected; a map with a timeline that shows infections spreading hospital by hospital; and an interactive graphic that allows users to sort data by preventable injuries or by hospital to see where people are getting hurt. I like it because it is very easy to understand and navigate. Users can explore the data in a very intuitive way. Also it had a real impact: the Nevada legislature responded with six pieces of legislation. The journalists involved worked very hard to acquire and clean up the data. One of the journalists, Alex Richards, sent data back to hospitals and to the state at least a dozen times to get mistakes corrected. -Government Employee Salary Database -Full-text visualization of the Iraqi War Logs, Associated Press -One of my favorite pieces of data journalism is the "Murder Mysteries" project by Tom Hargrove of the Scripps Howard News Service (Figure 8). He built from government data and public records requests a demographically-detailed database of more than 185,000 unsolved murders, and then designed an algorithm to search it for patterns suggesting the possible presence of serial killers. -love ProPublica's Message Machine story and nerd blog post (Figure 9). It all got started when some Twitterers expressed their curiosity about having received different emails from the Obama campaign. -One of my favourite data journalism projects is Andrew Garcia Phillips' work on Chartball (Figure 10). Andrew is a huge sports fan with a voracious appetite for data, a terrific eye for design and the capacity to write code. -
Our Stories Come As Code
OpenDataCity was founded towards the end of 2010. There was pretty much nothing that you could call data journalism happening in Germany at this time. Why did we do this? Many times we heard people working for newspapers and broadcasters say: "No, we are not ready to start a dedicated data journalism unit in our newsroom. But we would be happy to outsource this to someone else." -Data projects don't date -You can build on your past work -Data journalism pays for itself
The ABC's Data Journalism Play
Our traditions are independent public service journalism. The ABC is regarded the most trusted news organization in the country. These are exciting times and under a managing director — the former newspaper executive Mark Scott — content makers at the ABC have been encouraged to as the corporate mantra puts it — be 'agile'. Of course, that's easier said than done. But one initiative in recent times designed to encourage this has been a competitive staff pitch for money to develop multi-platform projects. This is how the ABC's first ever data journalism project was conceived. Sometime early in 2010 I wandered into the pitch session to face with three senior 'ideas' people with my proposal. It was my argument that no doubt within 5 years the ABC would have its own data journalism unit. It was inevitable, I opined. But the question was how are we going to get there, and who's going to start. For those readers unfamiliar with the ABC, think of a vast bureaucracy built up over 70 years. Its primary offering was always radio and television. With the advent of online in the last decade this content offering unfurled into text, stills and a degree of interactivity previously unimagined. The web space was forcing the ABC to rethink how it cut the cake (money) and rethink what kind of cake it was baking (content). It is of course a work in progress. But something else was happening with data journalism. Government 2.0 (which as we discovered is largely observed in the breach in Australia) was starting to offer new ways of telling stories that were hitherto buried in the zero's and dots. On the 24th of November 2011 the ABC's multi-platform project and ABC News Online went live with 'Coal Seam Gas by the Numbers'. It was five pages of interactive maps, data visualizations and text. It wasn't exclusively data journalism — but a hybrid of journalisms that was born of the mix of people on the team and the story, which to put in context is raging as one of the hottest issues in Australia. Our team A web developer and designer A lead journalist A part time researcher with expertise in data extraction, excel spread sheets and data cleaning A part time junior journalist A consultant executive producer A academic consultant with expertise in data mining, graphic visualization and advanced research skills The services of a project manager and the administrative assistance of the ABC's multi-platform unit Importantly we also had a reference group of journalists and others whom we consulted on a needs basis -Where did we get the data from? The data for the interactive maps were scraped from shapefiles (a common kind of file for geospatial data) downloaded from government websites. Other data on salt and water were taken from a variety of reports. The data on chemical releases was taken from environmental permits issued by the government. -What did we learn? 'Coal Seam Gas by the Numbers' was an ambitious in content and scale. The big picture: some ideas Big media organzations need to engage in capacity building to meet the challenges of data journalism. My hunch is there are a lot of geeks and hackers hiding in media technical departments desperate to get out. So we need 'hack and hacker meets' workshops where the secret geeks, younger journalists, web developers and designers come out to play with more experienced journalists for skill sharing and mentoring. Task: download this data set and go for it! Ipso facto Data journalism is interdisciplinary. Data journalism teams are made of people who would not in the past have worked together. The digital space has blurred the boundaries. We live in a fractured, distrustful body politic. The business model that formerly delivered professional independent journalism - imperfect as it is — is on the verge of collapse. Data journalism is just another tool by which we will navigate the digital space. It's where we will map, flip, sort, filter, extract and see the story amidst all those 0's and 1's. In the future we'll be working side by side with the hackers, the developers the designers and the coders. It's a transition that requires serious capacity building. We need news managers who "get" the digital/journalism connection to start investing in the build. - See more at: http://datajournalismhandbook.org/1.0/en/in_the_newsroom_0.html#sthash.Lkr0r11N.dpuf
Whose wrote Planet Google?
Randall Stross
Kaas & Mulvad: Semi-finished Content for Stakeholder Groups -
Stakeholder media is an emerging sector, largely overlooked by media theorists, which has the potential to have a tremendous impact either through online networks or by providing content to news media. It can be defined as (usually online) media that is controlled by organzational or institutional stakeholders, and which is used to advance certain interests and communities. NGOs typically create such media; so do consumer groups, professional associations, labour unions, etc. The key limit on its ability to influence public opinion or other stakeholders is often that it lacks capacity to undertake discovery of important information, even more so than the downsized news media. Kaas og Mulvad, a for-profit Danish corporation, is one of the first investigative media enterprises that provides expert capacity to these stakeholder outlets. The firm originated in 2007 as a spinoff of the non-profit Danish Institute for Computer-Assisted Reporting (Dicar), which sold investigative reports to media and trained journalists in data analysis. Its founders, Tommy Kaas and Nils Mulvad, were previously reporters in the news industry. Their new firm offers what they call "data plus journalistic insight" (content which remains semi-finished, requiring further editing or rewriting) mainly to stakeholder media, which finalise the content into news releases or stories and distribute it through both news media and their own outlets (such as websites). Unemployment Map for 3F Living Conditions for 3F Debt for "Ugebrevet A4" Dangerous Facilities in Denmark Corporate Responsibility Data for Vestas Name Map for Experian Smiley Map for Ekstra Bladet Processes: Innovative IT plus analysis Value created: Personal and firm brands and revenue -
A Five Minute Field Guide
Streamlining Your Search While they may not always be easy to find, many databases on the web are indexed by search engines, whether the publisher intended this or not. Here are a few tips: When searching for data, make sure that you include both search terms relating to the content of the data you're trying to find as well as some information on the format or source that you would expect it to be in. Google and other search engines allow you to search by file type. You can also search by part of a URL. Googling for 'inurl:downloads filetype:xls' will try to find all Excel files that have "downloads" in their web address (if you find a single download, it's often worth just checking what other results exist for the same folder on the web server). Browse data sites and services Over the last few years a number of dedicated data portals, data hubs and other data sites have appeared on the web. The Data Hub. A community-driven resource run by the Open Knowledge Foundation that makes it easy to find, share and reuse openly available sources of data, especially in ways that are machine-automated. ScraperWiki. an online tool to make the process of extracting "useful bits of data easier so they can be reused in other apps, or rummaged through by journalists and researchers." Most of the scrapers and their databases are public and can be re-used. The World Bank and United Nations data portals provide high-level indicators for all countries, often for many years in the past. A number of startups are emerging, that aim to build communities around data sharing and re-sale. This includes Buzzdata — a place to share and collaborate on private and public datasets — and data shops such as Infochimps and DataMarket. DataCouch — A place to upload, refine, share & visualize your data. An interesting Google subsidiary, Freebase, provides "an entity graph of people, places and things, built by a community that loves open data." Research data. There are numerous national and disciplinary aggregators of research data, such as the UK Data Archive. While there will be lots of data that is free at the point of access, there will also be much data that requires a subscription, or which cannot be reused or redistributed without asking permission first. Ask a Forum Ask a Mailing List Join Hacks/Hackers Ask an Expert Learn About Government IT Search again using phrases and improbable sets of words you've spotted since last time Write an FOI Request
what is the The Data Journalism Handbook and who wrote it?
The Data Journalism Handbook was born at a 48 hour workshop at MozFest 2011 in London. It subsequently spilled over into an international, collaborative effort involving dozens of data journalism's leading advocates and best practitioners.
The Survey
The European Journalism Centre conducted a survey to find out more about training needs of journalists. We found there is a big willingness to get out of the comfort zone of traditional journalism and to invest time to master the new skills. The results from the survey showed us that journalists see the opportunity, but need a bit of support to cut through the initial problems keeping them from working with data. There is a confidence, that should data journalism get more adopted, the workflows, the tools and the results will improve quite quickly.
Behind the Scenes at the Guardian Datablog
The Guardian Datablog — which I edit — was to be a small blog offering the full datasets behind our news stories. Now it consists of a front page (guardian.co.uk/data); searches of world government and global development data; data visualizations by from around the web and Guardian graphic artists, and tools for exploring public spending data. Every day, we use Google spreadsheets to share the full data behind our work; we visualize and analyze that data, then use it to provides stories for the newspaper and the site.
Data Journalism at the Zeit Online
The PISA based Wealth Comparison project is an interactive visualization that enables comparison of standards of living in different countries. The interactive uses data from the OECD's comprehensive world education ranking report, PISA 2009, published in December 2010. The report is based on a questionnaire which asks fifteen-year-old pupils about their living situation at home. The idea was to analyze and visualize this data to provide a unique way of comparing standards of living in different countries. At the Zeit Online, we've found that our data journalism projects have brought us a lot of traffic and have helped us to engage audiences in new ways. For example, there was a wide coverage about the situation at the nuclear plant in Fukushima after the Tsunami in Japan. A map shows how many people would have to be evacuated in a similar situation in Germany. The result: lots and lots of traffic and the project went viral over the social media sphere. Data journalism projects can be relatively easily adapted to other languages. We created an English language version about proximity to nuclear power plants in the US, which was a great traffic motor. News organizations want to be recognized as trusted and authoritative sources amongst their readers. We find that data journalism projects combined with enabling our readers to look and reuse the raw data brings us a high degree of credibility. EX: To take another example: the German Federal Statistic Office has published a great dataset on vital statistics for Germany, including modelling various demographic scenarios up until 2060. The typical way to represent this is a population pyramid — such as the one from the Federal Statistics Agency.
The following people have drafted or otherwise directly contributed to text of The data journalism handbook
The following people have drafted or otherwise directly contributed to text which is in the current version of the book. The illustrations are by graphic designer Kate Hudson. Gregor Aisch, Open Knowledge Foundation Brigitte Alfter, Journalismfund.eu David Anderton, Freelance Journalist James Ball, The Guardian Caelainn Barr, Citywire Mariana Berruezo, Hacks/Hackers Buenos Aires Michael Blastland, Freelance Journalist Mariano Blejman, Hacks/Hackers Buenos Aires John Bones, Verdens Gang Marianne Bouchart, Bloomberg News Liliana Bounegru, European Journalism Centre Brian Boyer, Chicago Tribune Paul Bradshaw, Birmingham City University Wendy Carlisle, Australian Broadcasting Corporation Lucy Chambers, Open Knowledge Foundation Sarah Cohen, Duke University Alastair Dant, The Guardian Helen Darbishire, Access Info Europe Chase Davis, Center for Investigative Reporting Steve Doig, Walter Cronkite School of Journalism of Arizona State University Lisa Evans, The Guardian Tom Fries, Bertelsmann Stiftung Duncan Geere, Wired UK Jack Gillum, Associated Press Jonathan Gray, Open Knowledge Foundation Alex Howard, O'Reilly Media Bella Hurrell, BBC Nicolas Kayser-Bril, Journalism++ John Keefe, WNYC Scott Klein, ProPublica Alexandre Léchenet, Le Monde Mark Lee Hunter, INSEAD Andrew Leimdorfer, BBC Friedrich Lindenberg, Open Knowledge Foundation Mike Linksvayer, Creative Commons Mirko Lorenz, Deutsche Welle Esa Mäkinen, Helsingin Sanomat Pedro Markun, Transparência Hacker Isao Matsunami, Tokyo Shimbun Lorenz Matzat, OpenDataCity Geoff McGhee, Stanford University Philip Meyer, Professor Emeritus, University of North Carolina at Chapel Hill Claire Miller, WalesOnline Cynthia O'Murchu, Financial Times Oluseun Onigbinde, BudgIT Djordje Padejski, Knight Journalism Fellow, Stanford University Jane Park, Creative Commons Angélica Peralta Ramos, La Nacion (Argentina) Cheryl Phillips, The Seattle Times Aron Pilhofer, New York Times Lulu Pinney, Freelance Infographic Designer Paul Radu, Organised Crime and Corruption Reporting Project Simon Rogers, The Guardian Martin Rosenbaum, BBC Amanda Rossi, Friends of Januária Martin Sarsale, Hacks/Hackers Buenos Aires Fabrizio Scrollini, London School of Economics and Political Science Sarah Slobin, Wall Street Journal Sergio Sorin, Hacks/Hackers Buenos Aires Jonathan Stray, The Overview Project Brian Suda, (optional.is) Chris Taggart, OpenCorporates Jer Thorp, The New York Times R&D Group Andy Tow, Hacks/Hackers Buenos Aires Luk N. Van Wassenhove, INSEAD Sascha Venohr, Zeit Online Jerry Vermanen, NU.nl César Viana, University of Goiás Farida Vis, University of Leicester Pete Warden, Independent Data Analyst and Developer Chrys Wu, Hacks/Hackers
How the News Apps Team at Chicago Tribune Works
The news applications team at the Chicago Tribune is a band of happy hackers embedded in the newsroom. We work closely with editors and reporters to help: (1) research and report stories, (2) illustrate stories online and (3) build evergreen web resources for the fine people of Chicagoland.
Start With the Data, Finish With a Story
To draw your readers in you have to be able to hit them with a headline figure that makes them sit up and take notice. You should almost be able to read the story without having to know that it comes from a dataset. Make it exciting and remember who your audience are as you go.
Data Journalism at the BBC
The term 'data journalism' can cover a range of disciplines and is used in varying ways in news organizations, so it may be helpful to define what we mean by 'data journalism' at the BBC. Broadly the term covers projects that use data to do one or more of the following: Enable a reader to discover information that is personally relevant Reveal a story that is remarkable and previously unknown Help the reader to better understand a complex issue These categories may overlap and in an online environment can often benefit from some level of visualization. -Make It Personal-On the BBC News website we have been using data to provide services and tools for our users for well over a decade. -Simple Tools As well as providing ways to explore large data sets, we have also had success creating simple tools for users that provide personally relevant snippets of information. These tools appeal to the time-poor who may not choose to explore lengthy analysis. The ability to easily share a 'personal' fact is something we have begun to incorporate as standard. -Mining The Data-In this area we have found it most productive to partner with investigative teams or programs which have the expertise and time to investigate a story. The BBC current affairs program Panorama spent months working with the Centre for Investigative Journalism, gathering data on public sector pay. -Understanding An Issue- But data journalism doesn't have to be an exclusive no-one else has spotted. The job of the data visualization team is to combine great design with a clear editorial narrative to provide a compelling experience for the user. Engaging visualizations of the right data can be used to give a better understanding of an issue or story and we frequently use this approach in our story-telling at the BBC. -Team Overview-The team that produces data journalism for the BBC News website is comprised of about 20 journalists, designers and developers.
Basic Steps in Working with Data
There are at least three key concepts you need to understand when starting a data project: Data requests should begin with a list of questions you want to answer. Data often is messy and needs to be cleaned. Data may have undocumented features Know the Questions You Want to Answer Cleaning Messy Data-One of the biggest problems in database work is that often you will be using for analysis reasons data that has been gathered for bureaucratic reasons. Data May Have Undocumented Features- The Rosetta Stone of any database is the so-called data dictionary. Typically, this file (it may be text or PDF or even a spreadsheet) will tell you how the data file is formatted (delimited text, fixed width text, Excel, dBase, et al.), the order of the variables, the names of each variable and the datatype of each variable (text string, integer, decimal, et al.)
Why Journalists Should Use Data
Today news stories are flowing in as they happen, from multiple sources, eye-witnesses, blogs and what has happened is filtered through a vast network of social connections, being ranked, commented and more often than not: ignored. This is why data journalism is so important. Gathering, filtering and visualizing what is happening beyond what the eye can see has a growing value. Using data the job of journalists shifts its main focus from being the first ones to report to being the ones telling us what a certain development might actually mean. The range of topics can be far and wide. The next financial crisis that is in the making. The economics behind the products we use. The misuse of funds or political blunders, presented in a compelling data visualization that leaves little room to argue with it. This is why journalists should see data as an opportunity. They can, for example, reveal how some abstract threat such as unemployment affects people based on their age, gender, education. Using data transforms something abstract into something everyone can understand and relate to. They can create personalized calculators to help people to make decisions, be this buying a car, a house, deciding on an education or professional path in life or doing a hard check on costs to keep out of debt. They can analyze the dynamics of a complex situation like riots or political debates, show the fallacies and help everyone to see possible solutions to complex problems. Additionally, getting into data journalism offers a future perspective. Today, when newsrooms cut down, most journalists hope to switch to public relations. Data journalists or data scientists though are already a sought-after group of employees, not only in the media.
All The Presidents Men
Two green reporters and rivals working for the Washington Post, Bob Woodward (Robert Redford) and Carl Bernstein (Dustin Hoffman), research the botched 1972 burglary of the Democratic Party Headquarters at the Watergate apartment complex. With the help of a mysterious source, code-named Deep Throat (Hal Holbrook), the two reporters make a connection between the burglars and a White House staffer. Despite dire warnings about their safety, the duo follows the money all the way to the top.
Wobbing Works. Use it!
Using freedom of information legislation — or wobbing, as it is sometimes called — is an excellent tool. But it requires method and, often, persistence. Here are three examples illustrating the strengths and challenges of wobbing from my work as an investigative journalist. - Case Study 1: Farm Subsidy Every year EU pays almost €60 billion to farmers and the farming industry. Every year. This has been going on since late 1950s and the political narrative was that the subsidies help our poorest farmers. Case Study 2: Side Effects We are all guinea pigs when it comes to taking medicine. Drugs can have side-effects. We all know this, we balance potential benefits with potential risks, and we make a decision. Unfortunately often this decision is not an informed decision. Case Study 3: Smuggling Death Recent history can be utterly painful for entire populations, particularly after wars and in times of transition. So how can journalists obtain hard data to investigate, when — for example — last decade's war profiteers are now in power? This was the task that a team of Slovenian, Croatian and Bosnian journalists set out to pursue.
What is Data journalism?
What makes data journalism different to the rest of journalism? Perhaps it is the new possibilities that open up when you combine the traditional 'nose for news' and ability to tell a compelling story, with the sheer scale and range of digital information now available. Data journalism can help a journalist tell a complex story through engaging infographics. Data can be the source of data journalism, or it can be the tool with which the story is told — or it can be both. Like any source, it should be treated with scepticism; and like any tool, we should be conscious of how it can shape and restrict the stories that are created with it. - See more at: http://datajournalismhandbook.org/1.0/en/introduction_0.html#sthash.Y2o3FG3s.dpuf
Getting Data from the Web
You've tried everything else, and you haven't managed to get your hands on the data you want. You've found the data on the web, but, alas — no download options are available and copy-paste has failed you. Fear not, there may still be a way to get the data out. What is machine-readable data?Machine readable data is created for processing by a computer, instead of the presentation to a human user. Scraping web sites: what for? Everyone has done this: you go to a web site, see an interesting table and try to copy it over to Excel so you can add some numbers up or store it for later. Yet this often does not really work, or the information you want is spread across a large number of web sites. Copying by hand can quickly become very tedious, so it makes sense to use a bit of code to do it. What you can and cannot scrape There are, of course, limits to what can be scraped. Some factors that make it harder to scrape a site include: Badly formatted HTML code with little or no structural information e.g. older government websites. Authentication systems that are supposed to prevent automatic access e.g. CAPTCHA codes and paywalls. Session-based systems that use browser cookies to keep track of what the user has been doing. A lack of complete item listings and possibilities for wildcard search. Blocking of bulk access by the server administrators. Tools that help you scrape There are many programs that can be used to extract bulk information from a web site, including browser extensions and some web services. How does a web scraper work? Web scrapers are usually small pieces of code written in a programming language such as Python, Ruby or PHP. Choosing the right language is largely a question of which community you have access to: if there is someone in your newsroom or city already working with one of these languages, then it makes sense to adopt the same language. The anatomy of a web page Any HTML page is structured as a hierarchy of boxes (which are defined by HTML "tags"). A large box will contain many smaller ones — for example a table that has many smaller divisions: rows and cells. An example: scraping nuclear incidents with Python NEWS is the International Atomic Energy Agency's (IAEA) portal on world-wide radiation incidents (and a strong contender for membership in the Weird Title Club!). The web page lists incidents in a simple, blog-like site that can be easily scraped. To start, create a new Python scraper on ScraperWiki and you will be presented with a text area that is mostly empty, except for some scaffolding code.
