Machine learning and political research might seem unrelated at first. One is a subset of artificial intelligence and computer science and the other is basically applied political science - applied social science in general. But they are actually quite similar. They both seek to understand human and social behaviors and draw useful patterns from them. Together, machine learning and political research can facilitate a better understanding of human and social behaviors and draw useful, detailed, actionable patterns from them.
Carnegie Mellon University provides a useful definition of machine learning: “The field of Machine Learning seeks to answer the question ‘How can we build computer systems that automatically improve with experience, and what are the fundamental laws that govern all learning processes?’” Dr. Roman Yampolskiy of the University of Louisville provides another, more succinct definition: “Machine Learning is the science of getting computers to learn as well as humans do or better.” The benefits of developing and optimizing computers capable of learning include increasing our capabilities to study the world and the systems that comprise it and generally to make life easier. See the term’s written use over time using the Google Books Ngram Viewer below:
[Google Books Ngram Viewer, Accessed 8/11/2018]
Political research offers a broad array of definitions and examples ranging from formal academic research following the scientific method adapted to the social sciences to more discrete and applied political research like opposition research and election polling. Two hallmarks of political research are that it includes: (1) analyses of politics, ranging from international relations to U.S. electoral politics; and (2) methodologies that guide the research and produce conclusions.
Any discussion of machine learning and political research would be incomplete without a third term that binds the two: big data. IBM defines big data as the “term applied to data sets whose size or type is beyond the ability of traditional relational databases to capture, manage, and process the data with low-latency.” According to IBM, big data also “has one or more of the following characteristics – high volume, high velocity, or high variety,” and it “comes from sensors, devices, video/audio, networks, log files, transactional applications, web, and social media - much of it generated in real time and in a very large scale.” Oracle helpfully summarizes this: “Put simply, big data is larger, more complex data sets, especially from new data sources.” Everyone from Barack Obama’s 2012 presidential campaign to the United Nations have employed big data and analytics. Big data is crucial to the synergy of machine learning and political research because, without it, traditional statistical methods could handle smaller sets of data that political researchers have worked with in the past. Read more about the differences between machine learning and traditional statistical modeling here.
Consider the example of Obama’s 2012 reelection campaign, which employed big data to previously-unheard-of success in reaching $1 billion in fundraising. Time’s article on the data analysts who helped propel Obama into his second term shows how effective big data and high-powered computational methods could be in the political realm. That campaign relied on big data to discover that George Cloney could motivate female West Coast voters in their 40s to donate via a contest that awarded dinner with, first and foremost, Cloney but also Obama. The campaign replicated that success on the East Coast with a Sara Jessica Parker dinner contest. Effective, albeit unusual, these dinner fundraising contests came from various data inputs, including “affection for contests, small dinners and celebrity,” that the Obama campaign recognized and utilized to great effect, according to the Time article. With an analytics department five times larger than the 2008 campaign and deploying data-mining and analytics experiments codenamed things like “Dreamcatcher” and “Narwhal,” Obama’s campaign leveraged big data to fundraise, get-out-the-vote, and simulate the election results. The campaign tackled one of the thorniest issues by combining loads of separate lists and databases into a massive database. The combined database worked wonders for the campign. For example, Time wrote:
Call lists in field offices, for instance, didn’t just list names and numbers; they also ranked names in order of their persuadability, with the campaign’s most important priorities first. About 75% of the determining factors were basics like age, sex, race, neighborhood and voting record. Consumer data about voters helped round out the picture. ‘We could [predict] people who were going to give online. We could model people who were going to give through mail. We could model volunteers,’ said one of the senior advisers about the predictive profiles built by the data.
Obama’s campaign also tested different messaging techniques with different appeals and authors to determine the ideal combination to elicit donations. “Michelle Obama’s e-mails performed best in the spring, and at times, campaign boss Messina performed better than Vice President Joe Biden,” Time wrote. This information formed part of the feedback loop that enabled the campaign to repurpose data from one segment into actionable intelligence on another front, like get-out-the-vote initiatives. Time wrote:
Online, the get-out-the-vote effort continued with a first-ever attempt at using Facebook on a mass scale to replicate the door-knocking efforts of field organizers. In the final weeks of the campaign, people who had downloaded an app were sent messages with pictures of their friends in swing states. They were told to click a button to automatically urge those targeted voters to take certain actions, such as registering to vote, voting early or getting to the polls. The campaign found that roughly 1 in 5 people contacted by a Facebook pal acted on the request, in large part because the message came from someone they knew.
While the Time article on Obama’s 2012 campaign focuses on big data and data analytics over pure machine learning, the message is clear: data-driven decision-making has clear advantages. Obama’s campaign invested in data analytics, and it helped earn him a second term in the highest office in the United States. Even just six years later in 2018, computational power has increased mightily, which increases the potential effectiveness of incorporating machine learning into political research and data analytics. Big data is important, as mentioned previously, because it enables machine learning algorithms to more quickly and easily sort through massive amounts of information than having people try to do it.
Machine learning can be thought of as the means to translate relatively unstructured and vast sources of information like words, images, videos, speeches, signs and symbols into insightful, comprehensive, revelatory knowledge. Data, and big data, can be thought of as the fuel, the inputs required for machine learning to actually learn and to produce knowledge, often in the form of predictive analysis. But how? Machine learning uses algorithms to perform problem-solving calculations. Algorithms are processes for accomplishing tasks and solving problems like cookbooks or instruction manuals. Machine learning algorithms can vary by the kind of algorithms and by performance, but they all require inputs that allow them to learn, whether they be sample inputs coded by humans or sensory data that the computer collects itself. More inputs intensify the learning process, giving algorithms more information on which to base future decisions. It’s like how experience functions for people. We learn that touching a hot stove will burn us, and we apply that knowledge to future actions to make sure we don’t get burned. Applications of machine learning vary from programming computers to learn how to play checkers better in 1959 to more current uses in the healthcare field. Machine learning applications in everyday life and in political research include:
|Application||Real-World Examples||Political Research Examples|
|Recommender systems||Netflix, Amazon, Spotify, Indeed||Recommending political candidates based on user-entered or collected information|
Natural language processing/speech recognition
Auto-correct/autocomplete, Grammarly, Skype Translator, Virtual digital assistants (Siri, Bixby, etc.)
|Identifying candidates/incumbents by their voices and speeches; identifying speech patterns and buzzwords|
|Computer vision||Video stabilization, Self-driving autonomous vehicles|
Identifying candidates/incumbents in images and videos; identifying patterns of images used in political marketing, donor outreach
|Text/sentiment/behavior analysis||Monitoring brand reputation, Radian6 and Salesforce, Twitter||Identifying patterns in texts, partisan marketing materials, political speeches, solicitation emails|
|Internet of Things|
Smart appliances, Nest, Fitbit, Tile, BigBelly, Google Glass, Smartphones, Security cameras
|Use your imagination|
[University of Georgia, 1/5/2017]
Stanford University Political Scientist Justin Grimmer provides another example of leveraging machine learning and big data in political research. According to The New York Times, Grimmer’s research “involves the computer-automated analysis of blog postings, Congressional speeches and press releases, and news articles, looking for insights into how political ideas spread.” Similarly, Stanford Unviersity Political Scientist Adam Bonica leveraged big data and machine learning in his 2013 analysis measuring “the ideology of candidates and contributors using campaign finance data” from “a data set of over 100 million contribution records from state and federal elections” to estimate the “ideal points for an expansive range of political actors.”
Like everything, using machine learning and big data in political research has its pros and cons. Drawbacks of machine learning applications and big data in political research include: (1) the existence of false positives that may lead people to infer correlation and causation between things that are actually not related; (2) the sheer availability of data to choose or manipulate to present pre-biased results; (3) the ability of machine learning to reproduce human biases; and (4) the ever-present concerns about privacy. Benefits of using machine learning and big data in political research include the potential to increase voter turnout, better understand the actions of politicians, candidates and political parties and even better understand the variables that contribute to violent conflicts in order to mitigate them.
At some point combining machine learning and big data with political research begs the question: is it worth it? Or at least, is it worth the hype? The reality of machine learning, big data and political research in political science and any social science may be more muted than the current wave of enthusiasm suggests. As Caltech Political Scientist R. Michael Alvarez wrote in the Oxford University Press blog, “So is big data a big deal in political science? The answer is neither yes nor no.” It’s not really novel after all. Political scientists like Keith T. Poole of the University of Georgia and the University of California San Diego and Howard Rosenthal of New York University and Princeton University have been working with big data for decades. While using data and machine learning applications may not be as new as is perhaps popularly conceived, what is new is the massive amount of available data and the computational power to apply machine learning to it. Those developments feed into the increasing utilization of machine learning and big data for many facets of personal and professional life. Machine learning, like all technologies, matters most in terms of how and why people apply them. While machine learning in political research has the potential for invasive, predatory uses, so does pretty much all internet-enabled technology, but that doesn’t mean we shouldn’t be using them. It means we must vigilantly apply human intelligence and awareness to our digital endeavors to ensure their ethical and accurate applications.