Presidential Debate Candidate Stance Analysis

By Lisa Singh, Stuart Soroka, and Kornraphop Kawintiranon for the Georgetown University McCourt School of Public Policy

This post looks at the opinions of Twitter users surrounding the first Presidential Debate. We look at content containing at least one debate hashtag, shared immediately before, during, and after the debate; and we determine the “stance” or opinion (for or against) of each tweet towards Biden and Trump.

The figure below shows the average proportion of expressed support or opposition for the candidate every minute of the debate from 8pm (20:00) to 11:30pm (23:30). A score above zero indicates a net positive stance towards the candidate. A score below zero indicates a net negative stance.

Presidential Debate 1: Stance of Candidates on Twitter

Presidential Debate 1: Stance of Candidates on Twitter


We see that in the hour before the debate begins, both candidates have a net negative stance. In other words, more opinions against each candidate are being shared than are opinions for each candidate. At around the 11 minute mark in the debate (roughly 21:11), pro-Biden expressions begin increasing, and continues to increase until the overall stance is in support of Biden. In contrast, around the same time, stance towards Trump decreases and continues to decrease for the first 10 minutes.

Over the course of the debate there are specific moments that help and hurt each of the candidates. When there is perceived bickering, there is usually a decline in stance for both candidates, although there are exceptions. The moment in which Trump received the most support was when he spoke about judges. Biden’s best moment was when he discussed race relations and the need to support black Americans.

By the end of the debate, the stance of Twitter discussion towards Biden had increased by 0.5 – a striking shift. He clearly benefited from the debate, at least in the short term amongst Twitter users. In contrast, the stance of Twitter discussion towards Trump decreased by approximately 0.2. Even as there was a good deal of opposition towards Trumps expressed immediately before the debate, there was even more negativity towards him at the end of the debate.

It is worth noting that within an hour of the debate the expressed stance towards Trump returned to pre-debate levels. These are decidedly negative, of course; but the additional negative impact of the debate on Twitter discussion of Trump may have been short-lived. The same is not true for Biden. The hours surrounding the debate saw a marked shift in expressed stance towards Biden, from by-minute averages that were anti-Biden to clearly pro-Biden. The shift is evident only 10 minutes into the 90-minute debate, and durable for the hour following the debate as well.

Twitter is by no means an accurate representation of public opinion more broadly – we must be sure to interpret these results as indicating the debate impact on Twitter discussion, not the public writ large. That said, where Twitter is concerned it seems relatively clear that Biden ‘won’ the debate.

Information about the analysis:

This analysis was conducted using approximately 1.3 million tweets that contained at least of the debate hashtags. We collect posts using the Twitter Streaming API. We use the core debate hashtags for this analysis, e.g. #debates2020, #presidentialdebate2020, etc. We determine if the tweet showed support, opposition, or neither for each candidate. For each minute, we compute an aggregate stance score as follows: Stance Score = (# Support – # Oppose) / (# of tweets that minute having a stance). To determine the stance itself, we trained a BERT fined tune model with a single layer on 5 million posts related to election 2020. We also had three people label 1000 tweets with stance to further improve our model.

This analysis was conducted by the Political Communications Election 2020 project of the Social Science and Social Media Collaborative (S3MC). The faculty involved in that project include Ceren Budak (University of Michigan), Jonathan Ladd (Georgetown University), Josh Pasek (University of Michigan), Lisa Singh (Georgetown University), Stuart Soroka (University of Michigan), and Michael Traugott (University of Michigan). The work is funded in part by National Science Foundation awards #1934925 and #1934494 and the Massive Data Institute. This project is a collaborative effort by the University of Michigan and Georgetown University to address how to harness the abundance of data from social media in order to understand social and political trends better. For the latest updates about this group’s research related to the 2020 Election, visit the project website: For information about the interdisciplinary methodology being developed by this group, visit:


Joint Image-Text Representations Using Deep Learning 

ICYMI (In Case You Missed It), the following work was presented at the 2020 Annual Meeting of the American Political Science Association (APSA). The presentation, titled “Joint Image-Text Classification Using an Attention-Based LSTM Architecture” was a part of the session “Image Processing for Political Research” on Thursday, September 10, 2020Post developed by Patrick Wu and Katherine Pearson. 

Political science has been enriched by the use of social media data. However, automated text-based classification systems often do not capture image content. Since images provide rich context and information in many tweets, these classifiers do not capture the full meaning of the tweet. In a new paper presented at the 2020 Annual Meeting of the American Political Science Association (APSA), Patrick Wu, Alejandro Pineda, and Walter Mebane propose a new approach for analyzing Twitter data using a joint image-text classifier. 

Human coders of social media data are able to observe both the text of a tweet and an attached image to determine the full meaning of an election incident being described. For example, the authors show the image and tweet below. 

Photo of people waiting to vote and text of tweet reading “Early voting lines in Palm Beach County, Florida #iReport #vote #Florida @CNN”

If only the text is considered, “Early voting lines in Palm Beach County, Florida #iReport #vote #Florida @CNN”, a reader would not be able to tell that the line was long. Conversely, if the image is considered separately from the text, the viewer would not know that it pictured a polling place. It’s only when the text and image are combined that the message becomes clear. 


A new framework called Multimodal Representations Using Modality Translation (MARMOT) is designed to improve data labeling for research on social media content. MARMOT uses modality translation to generate captions of the images in the data, then uses a model to learn the patterns between the text features, the image caption features, and the image features. This is an important methodological contribution because modality translation replaces more resource-intensive processes and allows the model to learn directly from the data, rather than on a separate dataset. MARMOT is also able to process observations that are missing either images or text. 


MARMOT was applied to two datasets. The first dataset contained tweets reporting election incidents during the 2016 U.S. general election, originally published in “Observing Election Incidents in the United States via Twitter: Does Who Observes Matter?” The tweets in this dataset report some kind of election incident. All of the tweets contain text, and about a third of them contain images. MARMOT performed better at classifying the tweets than the text-only classifier used in the original study. 

In order to test MARMOT against a dataset containing images for every observation, the authors used the Hateful Memes dataset released by Facebook to assess whether a meme is hateful or not. In this case, a multimodal model is useful because it is possible for neither the text nor the image to be hateful, but the combination of the two may create a hateful message. In this application, MARMOT outperformed other multimodal classifiers in terms of accuracy. 

Future Directions 

As more and more political scientists use data from social media in their research, classifiers will have to become more sophisticated to capture all of the nuance and meaning that can be packed into small parcels of text and images. The authors plan to continue refining MARMOT, and expand the models to accommodate additional elements such as video, geographical information, and time of posting. 

Not which ones, but how many?

Perspective on research from Guoer Liu, doctoral student in Political Science, and recipient of the 2019 Roy Pierce Award 

Guoer Liu

“Not which ones, but how many” is a phrase used in list experiments instruction, where researchers instruct participants, “After I read all four (five) statements, just tell me how many of them upset you. I don’t want to know which ones, just how many.” In retrospect, I was surprised to see that this phrase encapsulates not only the key research idea, but also my fieldwork adventure: not which plans could go awry, but how many. The fieldwork experience could be frustrating at times, but it has led me to uncharted terrain and brought insights into the research contexts. The valuable exposure would not have been possible without support from the Roy Pierce Award and guidance from Professor Yuki Shiraito

Research that I conducted with Yuki Shiraito explores the effect of behavior on political attitudes in authoritarian contexts to answer the question: does voting for autocracy reinforce individual regime support? To answer this question, two conditions need to be true. First, people need to honestly report their level of support before- and after- voting in authoritarian elections. Second, voting behavior needs to be random. Neither situation is probable in illiberal autocracies. Our project addresses these methodological challenges by conducting a field experiment that combines a list experiment and a randomized encouragement design in China.

In this study, list experiments are used instead of direct questions to measure the respondents’ attitudes towards the regime in the pre- and post-election surveys. The list experiment is a survey technique to mitigate preference falsification by respondents. Although the true preference of individual respondents will be hidden, the technique allows us to identify the average level of support for the regime within a group of respondents. In addition, we employ a randomized encouragement design where get-out-the-vote messages are randomly assigned, which help us estimate the average causal effect of a treatment. For effect moderated by prior support for the regime, we estimate the probability of the prior support using individual characteristics and then estimate the effect for the prior supporters via a latent variable model.

While the theoretical part of the project went smoothly and the simulation results were promising, the complication of fieldwork exceeded my expectation. For the list experiment survey, the usually reticent respondents started asking questions about the list questions immediately after the questionnaires were distributed. Their queries took the form of “I am upset by option 1, 2, and 4, so what number should I write down here?” This was not supposed to happen. List experiments are developed to conceal individual respondents’ answers from researchers. By replacing the questions of “which ones” with the question of “how many,” respondents’ true preference is not directly observable, which makes it easier for them to answer sensitive questions honestly. Respondents’ eagerness to tell me their options directly defeats the purpose of this design. Later I learned from other researchers that the problem I encountered was common in list experiment implementation regardless of research contexts and types of respondents. 

The rationale behind respondents’ desire to share their individual options despite being given a chance to hide them is thought-provoking. Is it because of the cognitive burden of answering a list question, which is not a familiar type of questions to respondents? Or is it because the sensitive items, despite careful construction, raise the alarm? Respondents are eager to specify their stance on each option and identify themselves as regime supporters: they do not leave any room for misinterpretation. To ease the potential cognitive burden, we will try a new way to implement the list experiment in a similar project on preference falsification in Japan. We are looking forward to seeing if it improves respondents’ comprehension of the list question setup. The second explanation is more concerning, however. It suggests the scope condition of list experiments as a valid tool to elicit truthful answers from respondents. Other more implicit tools, such as endorsement experiments, may be appropriate in those contexts to gauge respondent’s preference. 

Besides the intricacies of the list experiment, carrying out encouragement design on the ground is challenging. We had to modify the behavioral intervention to adapt needs from our local collaborators, and the realized sample size was only a fraction of the negotiated size initially. Despite the compromises, the implementation is imbued with uncertainty: meetings were postponed or rescheduled last minutes, instructions from local partners are sometimes inconsistent and conflictual. The frustration was certainly real. But the pain makes me cognizant of judgment calls researchers have to make in the backstage. The amount of effort required to produce reliable data is admirable. And as a consumer of data, I should always interpret data with great caution.

While the pilot study does not lead to a significant finding directly, the research experience and the methods we developed have informed the design of a larger project that we are currently doing in Japan.

I always thought of doing research as establishing a series of logical steps between a question and an answer. Before I departed for the pilot study, I made a detailed timeline for the project with color-coded tasks, flourish-shaped arrows pointing at milestones of the upcoming fieldwork. When I presented this plan to Professor Shiraito, he smiled and told me that “when doing research, it is generally helpful to think of the world in two ways: the ideal world and the real world. You should be prepared for both.” Wise words. Because of this, I am grateful for the Roy Pierce Award for offering the opportunity to catch a glimpse of the real world. And I am indebted to Professor Shiraito for helping me see the potential of attaining the ideal world with intelligence and appropriate tools.

ESC Center Tackles Ethical Questions about Tech 

Post developed by Katherine Pearson

Christian Sandvig, the Director of the new Center for Ethics, Society, and Computing (ESC), says he developed this new center “to reconcile the fact that I love computers, but I’m horrified by some of the things we do with them.” ESC is dedicated to intervening when digital media and computing technologies reproduce inequality, exclusion, corruption, deception, racism, or sexism. The center was officially launched at an event on January 24, 2020. Video of the event is available here

The associate director of ESC, Silvia Lindtner, elaborated on ESC’s mission at the event. “I’ve learned over the years not to shy away from talking about things that are uncomfortable,” she said. “This includes talking about things like sexism, racism, and various forms of exploitation – including how this involves us as researchers, and how we’ve experienced these ourselves.” 

ESC is sponsored by the University of Michigan School of Information, Center for Political Studies (CPS), and the Department of Communication and Media. CPS Director Ken Kollman called the new center “an exciting, interdisciplinary effort to ask and address challenging questions about technology, power, and inequality.” Thomas Finholt, Dean of the School of Information, said, “if you look at the world around us there are a seemingly unlimited number of examples where individual leaders or contributors would have benefitted dramatically from the themes this center is going to take on.” 

The wide range of disciplines represented among the ESC faculty is essential to its mission. “To have people in computer science, engineering, social science, and humanities interacting together on questions about the impacts of technology strikes me as the kind of necessary, but all too rare, collaborative efforts for generating new ideas and insights,” Kollman said. 

Christian Sandvig, Thomas Finholt, and Sylvia Lindtner cut the ribbon to launch the ESC Center

Christian Sandvig, Thomas Finholt, and Sylvia Lindtner cut the ribbon to launch the ESC Center

The launch event was comprised of two panel discussions featuring notable experts in technology and its applications. The first panel, “Accountable Technology — An Oxymoron?” explored the ways that big companies, the media, and individual consumers of technology hold the tech industry accountable for issues of equity and fairness. Pulitzer Prize-winning journalist Julia Angwin highlighted journalists’ role in investigating and framing coverage of tech, including her work to launch a publication dedicated to the investigation of the technology industry. Jen Gennai, Google executive responsible for ethics, fielded questions from the audience about accountability. danah boyd, Principal Researcher at Microsoft Research and the founder of Data & Society, and Marc DaCosta, co-founder and chairman of Enigma, rounded out the panel, which was moderated by Sandvig. 

During the second panel, “Culture After Tech Culture — Unimaginable?” Silvia Lindtner, Holly Okonkwo, Michaelanne Dye, Monroe Price, Shobita Parthasarathy, and André Brock debated the inevitability of technology’s impact on culture, and how the future might be reimagined. The panelists challenged the audience to think of technology from the perspectives of different cultures around the world, not just a single monolithic entity. Questions from the audience interrogated the ways the tech could be more inclusive.  

ESC organizers encourage students and faculty to get involved with the new center. A series of mixers to get to know ESC are scheduled through the spring. 

Computer simulations reveal partisan gerrymandering 

Post developed by Katherine Pearson 

How much does partisanship explain how legislative districts are drawn? Legislators commonly agree on neutral criteria for drawing district lines, but the extent to which partisan considerations overshadow these neutral criteria is often the subject of intense controversy.

Jowei Chen developed a new way to analyze legislative districts and determine whether they have been unfairly gerrymandered for partisan reasons. Chen, an Associate Professor of Political Science and a Research Associate at the Center for Political Studies, used computer simulations to produce thousands of non-partisan districting plans that follow traditional districting criteria. 

Simulated NC map

These simulated district maps formed the basis of Chen’s recent expert court testimony in Common Cause v. Lewis, a case in which plaintiffs argued that North Carolina state legislative district maps drawn in 2017 were unconstitutionally gerrymandered. By comparing the non-partisan simulated maps to the existing districts, Chen was able to show that the 2017 districts “cannot be explained by North Carolina’s political geography.” 

The simulated maps ignored all partisan and racial considerations. North Carolina’s General Assembly adopted several traditional districting criteria for drawing districts, and Chen’s simulations followed only these neutral criteria, including: equalizing population, maximizing geographic compactness, and preserving political subdivisions such as county, municipal, and precinct boundaries. By holding constant all of these traditional redistricting criteria, Chen determined that the 2017 district maps could not be explained by factors other than the intentional pursuit of partisan advantage. 

Specifically, when compared to the simulated maps, Chen found that the 2017 districts split far more precincts and municipalities than was reasonably necessary, and were significantly less geographically compact than the simulations. 

By disregarding these traditional standards, the 2017 House Plan was able to create 78 Republican-leaning districts out of 120 total; the Senate Plan created 32 Republican-leaning districts out of 50. 

Using data from 10 recent elections in North Carolina, Chen compared the partisan leanings of the simulated districts to the actual ones. Every one of the simulated maps based on traditional criteria created fewer Republican-leaning districts. In fact, the 2017 House and Senate plans were extreme statistical outliers, demonstrating that partisanship predominated over the traditional criteria in those plans. 

The judges agreed with Chen’s analysis that the 2017 maps displayed Republican bias, compared to the maps he generated by computer that left out partisan and racial considerations. On September 3, 2019, the state court struck down the maps as unconstitutional and enjoined their use in future elections. 

The North Carolina General Assembly rushed to adopt new district maps by the court’s deadline of September 19, 2019. To simplify the process, legislators agreed to use Chen’s computer-simulated maps as a starting point for the new districts. The legislature even selected randomly from among Chen’s simulated maps in an effort to avoid possible accusations of political bias in its new redistricting process.

Determining whether legislative maps are fair will be an ongoing process involving courts and voters across different states. But in recent years, the simulation techniques developed by Chen have been repeatedly cited and relied upon by state and federal courts in Pennsylvania, Michigan, and elsewhere as a more scientific method for measuring how much districting maps are gerrymandered for partisan gain. 

Accuracy in Reporting on Public Policy

Post developed by Katherine Pearson and Stuart Soroka

ICYMI (In Case You Missed It), the following work was presented at the 2019 Annual Meeting of the American Political Science Association (APSA).  The presentation, titled “Media (In)accuracy on Public Policy, 1980-2018” was a part of the session “Truth and/or Consequences” on Sunday, September 1, 2019.

Citizens can be well-informed about public policy only if the media accurately present information on the issues. Today’s media environment is faced with valid concerns about misinformation and biased reporting, but inaccurate reporting is nothing new. In their latest paper, Stuart Soroka and Christopher Wlezien analyze historical data on media coverage of defense spending to measure the accuracy of the reporting when compared to actual spending. 

In order to measure reporting on defense spending, Soroka and Wlezien compiled text of media reports between 1980 and 2018 from three corpuses: newspapers, television transcripts, and public affairs-focused Facebook posts. Using the Lexis-Nexis Web Services Kit, they developed a database of sentences focused on defense spending from the 17 newspapers with the highest circulation in the United States. Similar data were compiled with transcripts from the three major television broadcasters (ABC, CBS, NBC) and cable news networks (CNN, MSNBC, and Fox). Although more difficult to gather, data from the 500 top public affairs-oriented public pages on Facebook were compiled from the years 2010 through 2017. 

Soroka and Wlezien estimated the policy signal conveyed by the media sources by measuring the extent to which the text suggests that defense spending has increased, decreased, or stayed the same. Comparing this directly to actual defense spending over the same time period reveals the accuracy of year-to-year changes in the media coverage. For example, if media coverage were perfectly accurate, the signal would be exactly the same as actual changes in spending. 

As the figure below shows, the signal is not perfect. While there are some years when the media coverage tracks very closely to actual spending, there are other years when there is a large gap between the signal that news reports send and the defense budget. The gap may not entirely represent misinformation, however. In some of these cases, the media may be reporting on anticipated future changes in spending. 

media signal

For most years, the gap representing misinformation is fairly small. Soroka and Wlezien note that this “serves as a warning against taking too seriously arguments focused entirely on the failure of mass media.” This analysis shows evidence that media coverage can inform citizens about policy change. 

The authors conclude that there are both optimistic and pessimistic interpretations of the results of this study. On one hand, for all of the contemporary concerns about fake news, it is still possible to get an accurate sense of changes in defense spending from the media, which is good news for democratic citizenship. However, they observed a wide variation in accuracy among individual news outlets, which is a cause for concern. Since long before the rise of social media, citizens have been at risk of consuming misinformation based on the sources they select.