Machine Learning and Higher Education

Software is eating the world, so said Marc Andreesen in 2011.1 These days it seems that machine learning and its specialized algorithms are eating the software world.2 Is it thus a foregone conclusion that machine learning will play a significant role in disrupting technology and shaping our future?

Machine learning concerns teaching machines to learn about something without explicit programming. At the core of machine learning is the idea of modeling and extracting useful information out of data. Societal trends clearly point to data as the resource of the future. Colleges and universities are already swimming in data, and there is much more on the way. Imagine a future in which computers are everywhere and interconnected with everything from clothes to refrigerators, phones, vending machines, and more. Some people have even proposed equipping toilets with sensors that collect data.3 Storing those data will be very cheap.4 These interconnected devices will produce quantities of data that are too large human analysis, requiring us to teach computers to look for patterns in the data, identify predictor variables, and even try to predict for those variables.

Organizations that adapt and adopt machine learning will have a bright future. Machine learning is a new tool in the box, and it is worth learning how to use.5 Colleges, universities, and other educational institutions often adopt disruptive technologies in novel ways and are therefore in a good position to use machine learning to improve higher education. Adopting a machine learning–centric data-science approach as a tool for administrators and faculty could be a game changer for higher education.

Before we discuss machine learning further, it is important to briefly discuss analytics and traditional statistics. It is true that not all predictive analytics needs to be done with machine learning. The traditional methods here are statistical methods such as time series forecasting or various forms of regression. These have been used successfully in many fields for several years. In this article, from a very high overview, we refer to analytics as the subfield of machine learning that is predictive analytics and relies on training algorithms with a labeled training set, otherwise known as supervised learning. A common example is weather.6 Suppose we are interested in predicting sunny days. We can do this by observing our entire data set and feed the conditions into an algorithm that will look at days that were sunny and days that were not. This model is then trained and then can be fed new data and make guesses about whether it is sunny. For our purposes, we are interested in using supervised methods to make predictions and unsupervised methods such as classification to find patterns in the data that we might not have seen.

It is important to discuss the potential benefits and recommendations for pursuing machine learning as a tool for educational experts. In addition, it is important to note potential limitations and ethical considerations. Although an in-depth discussion is beyond the scope of this article, our hope is to start a conversation among higher education administrators, faculty, and IT specialists regarding the potential of machine learning to help make more-informed and better decisions — in other words, get people interested in machine learning to try it and see how things go. We are practicing what we advocate in this article. Heath Yates is actively exploring new algorithmic approaches to machine learning, while Craig Chamberlain is applying machine learning to data in higher education.

Potential Benefits of Machine Learning in Higher Education

Our interest in machine learning began by doing some very simple clustering analysis parallel to k-nearest neighbor (kNN). Such techniques as kNN can assist in finding patterns in larger data for analysts. During the 2016–17 year, Chamberlain was approached by his university to look at a question posed by a donor: “Can we identify a group of students who need an additional scholarship that would eventually lead to increased retention?” After spending time with several data sets and after a lot of research, Chamberlain and his team identified a group of students who needed additional money to remain enrolled. At the time, many believed that increasing retention for this group was a long shot. However, after awarding these students additional scholarships, retention rose from approximately 64% to about 90%. This effort has had two distinct benefits. The most important is that it contributed to the continued success of those students. The second is that it resulted in about $200,000 in additional net tuition revenue from an investment of about $50,000 in scholarships. By conducting basic machine learning to find patterns in the data and testing hypotheses, Chamberlain and his team were able to help students and the university. Although this use case is simple and nascent and relied on some traditional statistical inference, once machine learning and education begin interacting more often, this simple example can evolve into larger data sets with large solutions.

Although analytics is relatively widespread, we believe higher education has barely scratched the surface of the potential for machine learning. At the same time, we do not mean to suggest that no one is doing this kind of work. Rather, we believe there is room to grow in this area. Because Chamberlain works as an analyst in higher education — specifically enrollment management — he has seen substantial market potential for data science and machine learning. From student recruitment and success to curricular modeling and student-to-faculty ratios, large quantities of data go unused. Across the country, only a few consultants are using data science to assess student recruitment and success, which often results in a one-size-fits-all approach to recruiting, awarding financial aid, and measuring student success. Each graduating high school senior has numerous data points to assess, including location, grades, and parent income. Machine learning can assess data for each student and determine the likelihood that the student will enroll. Once a student enrolls, even more data points can be assessed, such as the living situation, grade on the first calculus exam, and major. Using machine learning, universities can then hone in on student retention and persistence and identify factors that influence student success.

Machine learning could potentially be used to look for patterns on a campus-wide level. Are there conditional probabilities or cluster analyses that suggest a pattern for passing a statistics course? Suppose, for example, that students who earn high marks in math classes are more likely to pass a statistics course. This seems obvious, but machine learning can provide a methodology to confirm or refute this belief.

How could university leadership use this information to increase retention and student success? Consider, for example, a correlation between taking particular courses that are not prerequisites for statistics and doing well in statistics. Using machine learning in exploratory data analysis might help find these kinds of patterns.

Kansas City uses machine learning to prevent potholes before they even form.7 Colleges and universities could consider using machine learning as a preventative tool as well. If an institution maintains detailed records on IT purchases and equipment, machine learning could be applied to IT equipment maintenance or maintenance in general.

For higher education, experts are going to need machine learning and people able to understand these algorithms to make better business decisions. Currently, many universities do not have a chief data scientist or a team of experts to apply machine learning in an official capacity. Therefore, many universities are missing opportunities that machine learning provides. We suspect that the institutions that are using machine learning are not talking about it much, and we encourage them to reach out to us and others to share their successes and challenges.

Recommendations for Adopting Machine Learning

Getting started with machine learning is not as difficult as some might imagine or claim. Universities, colleges, and other educational institutions are in a good position to adopt, start, grow, and implement machine learning projects, given their access to faculty who have mathematical, statistical, and computer science backgrounds. We offer the following high-level recommendations on how to implement machine learning projects at the university level.

Set Clear Expectations of Institutional Needs, Goals, and Requirements

Administrators and faculty should brainstorm about institutional needs that machine learning can help address. Start small with a very narrow question. For example, it might be useful to predict who is most likely to pass a certain difficult class. Are there discernable patterns that can help predict which students will pass calculus? Can machine learning predict enrollment in specific classes? Conversely, are there patterns in institutional data that can help predict which students are likely to earn degrees by using clustering analysis of some type? Also, be sure to have a goal for the potential findings. How can the university use these results to enhance students’ success, boost retention, and enhance student enrollment?

Temper with Realism

Find out if a faculty member or other expert at your institution or nearby can offer an informed opinion about whether the questions being asked can be answered — can the problem be solved by machine learning? Some problems are easy and inexpensive to solve, and others are not. If not, consult with the expert and go back to the drawing board. Make sure you have individuals who can do the proposed work — typically someone with a mathematics, statistics, and programming background. Industry refers to individuals who possess this combination of skills as “magical unicorns.”8 This is where being in higher learning pays off — these talents are usually close by, if not in one person then definitely in a group of people. The challenge for administrators is to be the bridge for people to cross traditional boundaries and make sure people involved pursue this as an interdisciplinary approach on behalf of the institution.

Consider Finances

Can your institution afford to hire a full-time data scientist? In many case, this might not be an option, given the salaries that such individuals command.9 A reasonable alternative is to put together a diverse, interdisplinary team of volunteers together who agree to do this work. The cost, so to speak, is whatever commitment the institution is willing to make in time investment from having members of that team engaged in other activities. Also important is to be aware that some investment in computing or storage may be required, but depending on how up-to-date your institution’s technology infrastructure is, this might not be necessary.

Realize That This Work Takes Time and Can Be Complicated

Being realistic about the overhead upfront tempers expectations. Your institution may have loads of historical data, but they might be in legacy systems or there may be technical hurdles that make it difficult easily access those data. The value is there, but it may take time to come up with a solution that is easy for everyone to use. Depending on questions administrators or faculty ask, it may also take some time to do proper data analysis. The key is to be patient and strategic. If you commit to doing machine learning, play the long game.

Understand That Security and Privacy Are Paramount to Machine Learning

There are likely local, state, and federal guidelines and laws that educational institutions must adhere to in order to safeguard their data. Before moving further on machine learning, all data should be as secure as possible. In addition, privacy of individuals must be protected. Most industries process, clean, and store data so that no individually discernable information may be gleaned out of it.

Do Machine Learning

The combination of imaginative, creative, and capable people means that new applications, innovations, and benefits are being found very quickly. If you do not have a group of individuals who can currently do machine learning, then find people interested and invest in them to do it. There are many resources in online learning and education to teach data science. Many of these resources are free, and some offer certifications at a reasonable price. The bar of entry is lower than you might think. Many of the programming and data science tools are free.

Ethical Considerations and Limitations

While we believe in the future of machine learning, it always pays to be cautious when adopting new technology. Machine learning is powerful. As the saying goes, with great power comes great responsibility. Often, in the excitement, it can be easy to lose site of the downsides of a new technology or tool. Machine learning provides analysts and decision makers with previously undreamed powers due to its ability to find patterns, make predictions, and draw inferences. The examples below can serve as cautionary tales and motivate questions regarding ethical considerations and the potential limitations of machine learning. In other words, just because we can do something does not imply or suggest we must. Any machine learning project should respect the institution’s policies and mission.

Respect Privacy

One of the earliest examples of using machine learning in predictive analytics came about from an incident in which Target sent coupons to a woman it determined was likely to be pregnant. The story goes something like this: The indignant father went to Target and complained to management that it sent coupons addressed to his teenage daughter with advertisements for maternity clothing and baby furniture. The store management apologized, but the father later contacted them and produced his own apology when he learned his daughter indeed was pregnant.10 How can machine learning techniques determine that a woman was pregnant? The short answer is that we are creatures of habit. Therefore, human behavior and patterns in data collected by companies can be used to identify emerging trends, such as pregnancy.11 The open question presented here is to ask if we always should.

Consider the Implications

More recently, machine-learning techniques were successfully used to infer the sexual orientation of individuals based on their facial features using data from dating websites. Specifically, two researchers from Stanford University trained an AI system to detect patterns in facial features and use this to identify the sexual orientation of a random male (with accuracy of 81%) and for a random woman (71% accurate).12 This is much higher than the reported capability of humans. This research has generated questions of whether such capabilities mean it is also possible to infer a person’s political orientation and IQ from their appearance.13

Currently, these are cutting-edge findings and research. Frankly, we are somewhat skeptical of the findings as a new form of pseudoscience. We present them as probable use cases where machine learning should not have even been applied to begin with. Machine learning should service the mission of higher education to reduce bias and prejudice in human society, not potentially promote it.

Insist on Appropriate Goals

Some controversial work has appeared in detecting criminality based on facial features. Researchers demonstrated that it was possible to infer criminality based solely on still face images using common machine learning techniques.14 Academia and media alike harshly criticized the findings as a new form of craniometrics and pseudoscience.15 One risk of data science is to create difficulty in understanding artificial intelligence systems based on questionable or pseudoscientific ideas.

These examples lead to one invariable and fundamental conclusion regarding the ethical implications of machine learning: We must be careful that machine learning is not abused, resulting in either intentional or unintentional biases or exclusionary analyses, predictions, and artificial intelligence systems. Some nascent research demonstrates that this is possible — research has shown that political leanings can influence how an artificial intelligence system might pick synonyms for political hashtags.16

These examples are not intended to create fear or dissuade readers from pursuing machine learning. Rather, we hope to generate a positive discussion about machine learning and how it can be carefully, responsibly, and maturely applied. Colleges, universities, and other educational institutions should define clear standards so that machine learning projects do not violate ethical standards and stay true to institutional goals and high standards. In fact, this is an opportunity for higher education to lead society by doing things the right way. One way to address many of these issues is to recruit a diverse, inclusive team of experts to analyze data carefully in an ethical and sound way. This is an easy and natural strength for universities, colleges, and other academic organizations.


Machine learning shows great potential to disrupt how we process and consume data and use software. Serious ethical considerations and limitations must be considered. However, higher education is naturally and uniquely positioned to capitalize on the promise of machine learning by using it as a tool for social and moral good. Higher education has the opportunity not only to use machine learning to help transform itself to make better decisions but also to explore how it might apply machine learning as a force for good. How can machine learning relate to and benefit higher education? Considering the trend towards automation in technology as a guide, we believe that the answer, ultimately, is in everything.

Leave a Reply

Your email address will not be published. Required fields are marked *