When Stanford University computer science professors Daphne Koller and Andrew Ng founded Coursera back in 2012, the mission was relatively simple: Use the Internet to massively expand the reach of educational content. That radical idea blossomed, and today Coursera works with 50 million people, which of course gives the online learning platform a tremendous amount of data to learn from.
Coursera was the world’s largest online learning platform before the coronavirus emerged in China late last year. And now that the rapid spread of COVID-19 has shut down thousands of physical schools around the world and sent teachers scrambling to figure out distance learning on the Internet, the 450-person outfit from Mountain View, California will likely expand its lead in a category suddenly deemed essential.
Coursera’s scale has widened considerably since the early days of the massively open online class (MOOC) movement. In addition to working with individuals seeking to boost their own skills, the company partners with more than 160 universities, some of whom use its platform for bachelor’s and graduate degree programs. Coursera also works with about 2,200 enterprises to upskill employees in the lucrative corporate learning market.
The breadth of classes available on Coursera is impressive. There are the requisite courses in data science and computer science – not to mention Ng’s obligatory deep learning course. But Coursera and its partners also tackle the arts, humanities, business, economics, foreign languages, engineering, law, and math. All told, the platform offers 4,000 courses, an ample medium for machine learning to have an impact.
Intro to ML
In Coursera’s early days, it didn’t have much data, so there was not much opportunity to use machine learning, according to Emily Glassberg Sands, senior director of data science at Coursera. But as Coursera grew over the years, it started generating plenty of data from which to extract trends and correlations.
“When I joined Coursera in early 2014, we were doing no machine learning, because we were doing what we needed to do at the time, which was to build out the platform, attract supply from educators, and attract demand from individual learners,” Sands tells Datanami.
“Andrew and Daphne were super forward-looking in that they were collecting the relevant data and starting to build the data infrastructure required to store it so we could use it down the line,” she continues. “But I would say that our machine learning applications have really taken off over the last couple of years as we’ve gotten to a stronger foundation in our core product and have more scale of learners and educators and employers on the platform.”
Sands oversees a data science team of 30 individuals split across three groups, including a data engineering group, an inside data science group focused on decision science, and another data science group dedicated to building the models that work with Coursera’s applications.
Coursera currently uses somewhere on the order of 100 models, which are predominantly written in Python using AWS’s SageMaker environment, Sands says. The company relies on a RedShift data warehouse to store business facts, which its data engineering team manages with a healthy dose of SQL.
A Model Learner
During the recent Women in Data Science (WIDS) event at Stanford University, Sands discussed several ways that Coursera is using machine learning to improve Coursera’s learning system. One of those is a “personalized coach” that helps a learner based on her specific progress in a course, according to Sands, who discussed her WIDS talk separately with Datanami.
“This is really oriented around the need to retain learners, to keep them motivated, to unblock them at their learning, and do that in a fully automated way for the millions of active learners in our MOOC content,” Sands says.
The personalized coach will determine whether an intervention is called for based on a student’s progress, and if so, what intervention should be made. It could be a behavioral intervention, like emphasizing social proof or a growth mindset, Sands says, or it could be a pedagogical intervention, like identifying the best review material given what subject the individual is struggling with.
“How do we keep them motivated? How do we make sure they can access the right review material if their stuck?” Sands says. “Different folks with different types of content need different types of interventions, and machine learning is a very good tool when you have the scale of data that we have for building those right interventions for the right folks at the right point in time.”
Coursera also uses machine learning to help match students’ existing skills sets to the new skills they’ll need in their chosen field. For example, the model will help a learner who’s currently a business analyst find the right mix of courses needed to become a data scientist.
“Just like our content is super diverse, our learners are also super diverse,” Sands says. “They come from just about every country in the world. They have a range of different backgrounds and interest and baseline abilities. And so data science plays a critical role in matching folks with the right learning content and product at each stage of their lifecycle.”
Another way that Coursera is using machine learning is to help automate much of the day-to-day work that teachers typically perform, but which doesn’t provide much of a value-add to the client. The company is beginning with a model for machine-assisted grading and feedback, Sands says.
“In the context of our degree programs, TAs [teacher’s assistants] are grading the assignments, but they’re grading assignments across thousands and thousands of learners, and we’d like to accelerate them,” she says.
The company is looking into what it can do here, such as by clustering assignment types for the TA or generating recommended grades or feedback for the TA, Sands says. However, there are inherent barriers to how much machine learning can automate this aspect of Coursera’s work.
“There’s a lot that is uniquely human in that judgment and that feedback,” Sands says. “What we’re really focused on as a platform is automating away the parts of teaching that are mundane, routine, more mechanical, and freeing up time for instructors, TAs, and support counselors to do that judgment work that is so uniquely human.”
The company also offers maintains a “degree at risk model,” which generates a predicted grade for a student as well as an explanation for why that grade is predicted. This information is served via a dashboard that counselors can access to keep their students on course.
“That doesn’t take away the need for human support,” Sands says. “It accelerates human support, because it allows folks on campus to prioritize the right students to reach out to and to have more information going into that conversation about what support or messages are going to land with that individual.”
In the end analysis, there is probably a lot more than Coursera can do with its considerable data. As education shifts online in the wake of the novel coronavirus epidemic, many more learners are likely to look to Coursera for education, and many more schools and universities will look to partner with Coursera.
“There’s a ton that data science can do to improve learning and teaching, and I think at Coursera, because of the platform model, the scale of the platform, the rich data that we’re collecting, and folks who are engaging on it, we have a unique opportunity to do things in pedagogy and teaching and learning that can’t be done in a brick and mortar environment or that couldn’t be done at a smaller scale learning provider.”