Joint Image-Text Representations Using Deep Learning 

ICYMI (In Case You Missed It), the following work was presented at the 2020 Annual Meeting of the American Political Science Association (APSA). The presentation, titled “Joint Image-Text Classification Using an Attention-Based LSTM Architecture” was a part of the session “Image Processing for Political Research” on Thursday, September 10, 2020Post developed by Patrick Wu and Katherine Pearson. 

Political science has been enriched by the use of social media data. However, automated text-based classification systems often do not capture image content. Since images provide rich context and information in many tweets, these classifiers do not capture the full meaning of the tweet. In a new paper presented at the 2020 Annual Meeting of the American Political Science Association (APSA), Patrick Wu, Alejandro Pineda, and Walter Mebane propose a new approach for analyzing Twitter data using a joint image-text classifier. 

Human coders of social media data are able to observe both the text of a tweet and an attached image to determine the full meaning of an election incident being described. For example, the authors show the image and tweet below. 

Photo of people waiting to vote and text of tweet reading “Early voting lines in Palm Beach County, Florida #iReport #vote #Florida @CNN”

If only the text is considered, “Early voting lines in Palm Beach County, Florida #iReport #vote #Florida @CNN”, a reader would not be able to tell that the line was long. Conversely, if the image is considered separately from the text, the viewer would not know that it pictured a polling place. It’s only when the text and image are combined that the message becomes clear. 

MARMOT

A new framework called Multimodal Representations Using Modality Translation (MARMOT) is designed to improve data labeling for research on social media content. MARMOT uses modality translation to generate captions of the images in the data, then uses a model to learn the patterns between the text features, the image caption features, and the image features. This is an important methodological contribution because modality translation replaces more resource-intensive processes and allows the model to learn directly from the data, rather than on a separate dataset. MARMOT is also able to process observations that are missing either images or text. 

Applications

MARMOT was applied to two datasets. The first dataset contained tweets reporting election incidents during the 2016 U.S. general election, originally published in “Observing Election Incidents in the United States via Twitter: Does Who Observes Matter?” The tweets in this dataset report some kind of election incident. All of the tweets contain text, and about a third of them contain images. MARMOT performed better at classifying the tweets than the text-only classifier used in the original study. 

In order to test MARMOT against a dataset containing images for every observation, the authors used the Hateful Memes dataset released by Facebook to assess whether a meme is hateful or not. In this case, a multimodal model is useful because it is possible for neither the text nor the image to be hateful, but the combination of the two may create a hateful message. In this application, MARMOT outperformed other multimodal classifiers in terms of accuracy. 

Future Directions 

As more and more political scientists use data from social media in their research, classifiers will have to become more sophisticated to capture all of the nuance and meaning that can be packed into small parcels of text and images. The authors plan to continue refining MARMOT, and expand the models to accommodate additional elements such as video, geographical information, and time of posting. 

Not which ones, but how many?

Perspective on research from Guoer Liu, doctoral student in Political Science, and recipient of the 2019 Roy Pierce Award 

Guoer Liu

“Not which ones, but how many” is a phrase used in list experiments instruction, where researchers instruct participants, “After I read all four (five) statements, just tell me how many of them upset you. I don’t want to know which ones, just how many.” In retrospect, I was surprised to see that this phrase encapsulates not only the key research idea, but also my fieldwork adventure: not which plans could go awry, but how many. The fieldwork experience could be frustrating at times, but it has led me to uncharted terrain and brought insights into the research contexts. The valuable exposure would not have been possible without support from the Roy Pierce Award and guidance from Professor Yuki Shiraito

Research that I conducted with Yuki Shiraito explores the effect of behavior on political attitudes in authoritarian contexts to answer the question: does voting for autocracy reinforce individual regime support? To answer this question, two conditions need to be true. First, people need to honestly report their level of support before- and after- voting in authoritarian elections. Second, voting behavior needs to be random. Neither situation is probable in illiberal autocracies. Our project addresses these methodological challenges by conducting a field experiment that combines a list experiment and a randomized encouragement design in China.

In this study, list experiments are used instead of direct questions to measure the respondents’ attitudes towards the regime in the pre- and post-election surveys. The list experiment is a survey technique to mitigate preference falsification by respondents. Although the true preference of individual respondents will be hidden, the technique allows us to identify the average level of support for the regime within a group of respondents. In addition, we employ a randomized encouragement design where get-out-the-vote messages are randomly assigned, which help us estimate the average causal effect of a treatment. For effect moderated by prior support for the regime, we estimate the probability of the prior support using individual characteristics and then estimate the effect for the prior supporters via a latent variable model.

While the theoretical part of the project went smoothly and the simulation results were promising, the complication of fieldwork exceeded my expectation. For the list experiment survey, the usually reticent respondents started asking questions about the list questions immediately after the questionnaires were distributed. Their queries took the form of “I am upset by option 1, 2, and 4, so what number should I write down here?” This was not supposed to happen. List experiments are developed to conceal individual respondents’ answers from researchers. By replacing the questions of “which ones” with the question of “how many,” respondents’ true preference is not directly observable, which makes it easier for them to answer sensitive questions honestly. Respondents’ eagerness to tell me their options directly defeats the purpose of this design. Later I learned from other researchers that the problem I encountered was common in list experiment implementation regardless of research contexts and types of respondents. 

The rationale behind respondents’ desire to share their individual options despite being given a chance to hide them is thought-provoking. Is it because of the cognitive burden of answering a list question, which is not a familiar type of questions to respondents? Or is it because the sensitive items, despite careful construction, raise the alarm? Respondents are eager to specify their stance on each option and identify themselves as regime supporters: they do not leave any room for misinterpretation. To ease the potential cognitive burden, we will try a new way to implement the list experiment in a similar project on preference falsification in Japan. We are looking forward to seeing if it improves respondents’ comprehension of the list question setup. The second explanation is more concerning, however. It suggests the scope condition of list experiments as a valid tool to elicit truthful answers from respondents. Other more implicit tools, such as endorsement experiments, may be appropriate in those contexts to gauge respondent’s preference. 

Besides the intricacies of the list experiment, carrying out encouragement design on the ground is challenging. We had to modify the behavioral intervention to adapt needs from our local collaborators, and the realized sample size was only a fraction of the negotiated size initially. Despite the compromises, the implementation is imbued with uncertainty: meetings were postponed or rescheduled last minutes, instructions from local partners are sometimes inconsistent and conflictual. The frustration was certainly real. But the pain makes me cognizant of judgment calls researchers have to make in the backstage. The amount of effort required to produce reliable data is admirable. And as a consumer of data, I should always interpret data with great caution.

While the pilot study does not lead to a significant finding directly, the research experience and the methods we developed have informed the design of a larger project that we are currently doing in Japan.

I always thought of doing research as establishing a series of logical steps between a question and an answer. Before I departed for the pilot study, I made a detailed timeline for the project with color-coded tasks, flourish-shaped arrows pointing at milestones of the upcoming fieldwork. When I presented this plan to Professor Shiraito, he smiled and told me that “when doing research, it is generally helpful to think of the world in two ways: the ideal world and the real world. You should be prepared for both.” Wise words. Because of this, I am grateful for the Roy Pierce Award for offering the opportunity to catch a glimpse of the real world. And I am indebted to Professor Shiraito for helping me see the potential of attaining the ideal world with intelligence and appropriate tools.

ESC Center Tackles Ethical Questions about Tech 

Post developed by Katherine Pearson

Christian Sandvig, the Director of the new Center for Ethics, Society, and Computing (ESC), says he developed this new center “to reconcile the fact that I love computers, but I’m horrified by some of the things we do with them.” ESC is dedicated to intervening when digital media and computing technologies reproduce inequality, exclusion, corruption, deception, racism, or sexism. The center was officially launched at an event on January 24, 2020. Video of the event is available here

The associate director of ESC, Silvia Lindtner, elaborated on ESC’s mission at the event. “I’ve learned over the years not to shy away from talking about things that are uncomfortable,” she said. “This includes talking about things like sexism, racism, and various forms of exploitation – including how this involves us as researchers, and how we’ve experienced these ourselves.” 

ESC is sponsored by the University of Michigan School of Information, Center for Political Studies (CPS), and the Department of Communication and Media. CPS Director Ken Kollman called the new center “an exciting, interdisciplinary effort to ask and address challenging questions about technology, power, and inequality.” Thomas Finholt, Dean of the School of Information, said, “if you look at the world around us there are a seemingly unlimited number of examples where individual leaders or contributors would have benefitted dramatically from the themes this center is going to take on.” 

The wide range of disciplines represented among the ESC faculty is essential to its mission. “To have people in computer science, engineering, social science, and humanities interacting together on questions about the impacts of technology strikes me as the kind of necessary, but all too rare, collaborative efforts for generating new ideas and insights,” Kollman said. 

Christian Sandvig, Thomas Finholt, and Sylvia Lindtner cut the ribbon to launch the ESC Center

Christian Sandvig, Thomas Finholt, and Sylvia Lindtner cut the ribbon to launch the ESC Center

The launch event was comprised of two panel discussions featuring notable experts in technology and its applications. The first panel, “Accountable Technology — An Oxymoron?” explored the ways that big companies, the media, and individual consumers of technology hold the tech industry accountable for issues of equity and fairness. Pulitzer Prize-winning journalist Julia Angwin highlighted journalists’ role in investigating and framing coverage of tech, including her work to launch a publication dedicated to the investigation of the technology industry. Jen Gennai, Google executive responsible for ethics, fielded questions from the audience about accountability. danah boyd, Principal Researcher at Microsoft Research and the founder of Data & Society, and Marc DaCosta, co-founder and chairman of Enigma, rounded out the panel, which was moderated by Sandvig. 

During the second panel, “Culture After Tech Culture — Unimaginable?” Silvia Lindtner, Holly Okonkwo, Michaelanne Dye, Monroe Price, Shobita Parthasarathy, and André Brock debated the inevitability of technology’s impact on culture, and how the future might be reimagined. The panelists challenged the audience to think of technology from the perspectives of different cultures around the world, not just a single monolithic entity. Questions from the audience interrogated the ways the tech could be more inclusive.  

ESC organizers encourage students and faculty to get involved with the new center. A series of mixers to get to know ESC are scheduled through the spring. 

Computer simulations reveal partisan gerrymandering 

Post developed by Katherine Pearson 

How much does partisanship explain how legislative districts are drawn? Legislators commonly agree on neutral criteria for drawing district lines, but the extent to which partisan considerations overshadow these neutral criteria is often the subject of intense controversy.

Jowei Chen developed a new way to analyze legislative districts and determine whether they have been unfairly gerrymandered for partisan reasons. Chen, an Associate Professor of Political Science and a Research Associate at the Center for Political Studies, used computer simulations to produce thousands of non-partisan districting plans that follow traditional districting criteria. 

Simulated NC map

These simulated district maps formed the basis of Chen’s recent expert court testimony in Common Cause v. Lewis, a case in which plaintiffs argued that North Carolina state legislative district maps drawn in 2017 were unconstitutionally gerrymandered. By comparing the non-partisan simulated maps to the existing districts, Chen was able to show that the 2017 districts “cannot be explained by North Carolina’s political geography.” 

The simulated maps ignored all partisan and racial considerations. North Carolina’s General Assembly adopted several traditional districting criteria for drawing districts, and Chen’s simulations followed only these neutral criteria, including: equalizing population, maximizing geographic compactness, and preserving political subdivisions such as county, municipal, and precinct boundaries. By holding constant all of these traditional redistricting criteria, Chen determined that the 2017 district maps could not be explained by factors other than the intentional pursuit of partisan advantage. 

Specifically, when compared to the simulated maps, Chen found that the 2017 districts split far more precincts and municipalities than was reasonably necessary, and were significantly less geographically compact than the simulations. 

By disregarding these traditional standards, the 2017 House Plan was able to create 78 Republican-leaning districts out of 120 total; the Senate Plan created 32 Republican-leaning districts out of 50. 

Using data from 10 recent elections in North Carolina, Chen compared the partisan leanings of the simulated districts to the actual ones. Every one of the simulated maps based on traditional criteria created fewer Republican-leaning districts. In fact, the 2017 House and Senate plans were extreme statistical outliers, demonstrating that partisanship predominated over the traditional criteria in those plans. 

The judges agreed with Chen’s analysis that the 2017 maps displayed Republican bias, compared to the maps he generated by computer that left out partisan and racial considerations. On September 3, 2019, the state court struck down the maps as unconstitutional and enjoined their use in future elections. 

The North Carolina General Assembly rushed to adopt new district maps by the court’s deadline of September 19, 2019. To simplify the process, legislators agreed to use Chen’s computer-simulated maps as a starting point for the new districts. The legislature even selected randomly from among Chen’s simulated maps in an effort to avoid possible accusations of political bias in its new redistricting process.

Determining whether legislative maps are fair will be an ongoing process involving courts and voters across different states. But in recent years, the simulation techniques developed by Chen have been repeatedly cited and relied upon by state and federal courts in Pennsylvania, Michigan, and elsewhere as a more scientific method for measuring how much districting maps are gerrymandered for partisan gain. 

Accuracy in Reporting on Public Policy

Post developed by Katherine Pearson and Stuart Soroka

ICYMI (In Case You Missed It), the following work was presented at the 2019 Annual Meeting of the American Political Science Association (APSA).  The presentation, titled “Media (In)accuracy on Public Policy, 1980-2018” was a part of the session “Truth and/or Consequences” on Sunday, September 1, 2019.

Citizens can be well-informed about public policy only if the media accurately present information on the issues. Today’s media environment is faced with valid concerns about misinformation and biased reporting, but inaccurate reporting is nothing new. In their latest paper, Stuart Soroka and Christopher Wlezien analyze historical data on media coverage of defense spending to measure the accuracy of the reporting when compared to actual spending. 

In order to measure reporting on defense spending, Soroka and Wlezien compiled text of media reports between 1980 and 2018 from three corpuses: newspapers, television transcripts, and public affairs-focused Facebook posts. Using the Lexis-Nexis Web Services Kit, they developed a database of sentences focused on defense spending from the 17 newspapers with the highest circulation in the United States. Similar data were compiled with transcripts from the three major television broadcasters (ABC, CBS, NBC) and cable news networks (CNN, MSNBC, and Fox). Although more difficult to gather, data from the 500 top public affairs-oriented public pages on Facebook were compiled from the years 2010 through 2017. 

Soroka and Wlezien estimated the policy signal conveyed by the media sources by measuring the extent to which the text suggests that defense spending has increased, decreased, or stayed the same. Comparing this directly to actual defense spending over the same time period reveals the accuracy of year-to-year changes in the media coverage. For example, if media coverage were perfectly accurate, the signal would be exactly the same as actual changes in spending. 

As the figure below shows, the signal is not perfect. While there are some years when the media coverage tracks very closely to actual spending, there are other years when there is a large gap between the signal that news reports send and the defense budget. The gap may not entirely represent misinformation, however. In some of these cases, the media may be reporting on anticipated future changes in spending. 

media signal

For most years, the gap representing misinformation is fairly small. Soroka and Wlezien note that this “serves as a warning against taking too seriously arguments focused entirely on the failure of mass media.” This analysis shows evidence that media coverage can inform citizens about policy change. 

The authors conclude that there are both optimistic and pessimistic interpretations of the results of this study. On one hand, for all of the contemporary concerns about fake news, it is still possible to get an accurate sense of changes in defense spending from the media, which is good news for democratic citizenship. However, they observed a wide variation in accuracy among individual news outlets, which is a cause for concern. Since long before the rise of social media, citizens have been at risk of consuming misinformation based on the sources they select. 

Using Text and Images to Examine 2016 Election Tweets

Post developed by Dory Knight-Ingram 

ICYMI (In Case You Missed It), the following work was presented at the 2019 Annual Meeting of the American Political Science Association (APSA).  The presentation, titled “Using Neural Networks to Classify Based on Combined  Text and Image Content: An Application to Election Incident Observation” was a part of the session “Deep Learning in Political Science” on Friday, August 30, 2019.

A new election forensics process developed by Walter Mebane and Alejandro Pineda uses machine-learning to examine not just text, but images, too, for Twitter posts that are considered reports of “incidents” from the 2016 US Presidential Election. 

Mebane and Pineda show how to combine text and images into a single supervised learner for prediction in US politics using a multi-layer perceptron. The paper notes that in election forensics, polls are useful, but social media data may offer more extensive and granular coverage. 

The research team gathered individual observation data from Twitter in the months leading up to the 2016 US Presidential Election. Between Oct. 1-Nov. 8, 2016, the team used Twitter APIs to collect millions of tweets, arriving at more than 315,180 tweets that apparently reported one or more election “incidents” – an individual’s report of their personal experience with some aspect of the election process. 

At first, the research team used only text associated with tweets. But the researchers note that sometimes, images in a tweet are informative, while the text is not. It’s possible for the text alone to not make a tweet a report of an election incident, while the image may indeed show an incident. 

To solve this problem, the research team implemented some “deep neural network classifier methods that use both text and images associated with tweets. The network is constructed such that its text-focused parts learn from the image inputs, and its image-focused parts learn from the text inputs. Using such a dual-mode classifier ought to improve performance. In principle our architecture should improve performance classifying tweets that do not include images as well as tweets that do,” they wrote.

“Automating analysis for digital content proves difficult because the form of data takes so many different shapes. This paper offers a solution: a method for the automated classification of multi-modal content.” The research team’s model “takes image and text as input and outputs a single classification decision for each tweet – two inputs, one output.” 

The paper describes in detail how the research team processed and analyzed tweet-images, which included loading image files in batches, restricting image types to .jpeg or .png., and using small image sizes for better data processing results. 

The results were mixed.

The researchers trained two models using a sample of 1,278 tweets. One model combined text and images, the other focused only on text. In the text-only model, accuracy steadily increases until it achieves top accuracy at 99%. “Such high performance is testimony to the power of transfer learning,” the authors wrote. 

However, the team was surprised that including the images substantially worsened performance. “Our proof-of-concept combined classifier works. But the model structure and hyperparameter details need to be adjusted to enhance performance. And it’s time to mobilize hardware superior to what we’ve used for this paper. New issues will arise as we do that.”