I had an opportunity to work on a project to uncover insights about reviews on ratemyprofessors.com. Apparently over 4 millions of students use the rating system for class selection every year. Most of us may already know final rating is just the average of all students. However, few understand what truly go into and drive the ratings. In addition to that, reviews can be highly inconsistent from student to student. A rating of 5 from student A can mean differently to student B. Ratings can easily skew toward either positive and negative end of spectrum as well. Just think about your personal experience. Do you tend to remember more about most satisfied or miserable experience, but remember less about average experience?
Objectives: The project is to help gain insights, and understand key drivers of ratings. This can benefit users from three aspects. From professors’ aspect, this will enhance self-awareness for continuous improvement. From students’ perspective, this can be better understood and used for class registration. From schools’ perspective, the rating can even be used to facilitate hiring decision.
Data: I collected data from ratemyprofessors.com. To ensure the data covers meaningful number of universities and geographic regions, three universities were chosen: University of California, Berkeley, New York University, and University of Florida. Initial data size is about 16K. When looking into the data more, noted most of data records do not contain either valid rating or meaningful number of reviews, which can add lots of uncertainty to the model. After clean up and filter out unusable data, 5K data records remained and were used for further Linear Regression analysis.
Preparation: Before detailed analysis, I plotted Seaborn Pairplots to gain a high-level understanding of inter-relationships among variables. I then split training and test splits data sets. This was to keep test data separate and untouched, so it could be used later on to test model quality. In addition, noted not all features are on a meaningful scale, I normalized features using transformation and StandardScaler.
Analysis: Once data was normalized, StatsModels was used to gain a general impression toward features. With StatsModels Summary, I was able to quickly filter out features that have immaterial impact to rating. However, there are still too many features left. I decided to use Lasso Linear Regression to further drill down. Lasso Regression model penalizes models with too many features, and can be helpful to eliminate unnecessary ones. To optimize the hyper parameters alpha and degree for Lasso model, I went through multiple GridSerachCV iterations. During the process, I also used Cross Validation to maximize the training effect. Once optimal alpha and degree was identified, coefficients of features were determined accordingly. As a result, R squared of model was assessed and indicates moderate signal between key features and rating.
Conclusion and Key Take-away:
- Confirmed negative skewness of ratings. Rating of 5 is considerably more than other ratings.
- Confirmed medium signal between key features and final ratings
- Identified 9 most impactful features out of original 30+ features.
- Positive impact: ‘amazing lectures’, ‘respected’, and ‘give good feedback’
- Negative impact: ‘level of difficulty’
- No material impact: ‘Skip class? you won’t pass’, ‘lots of homework’, ‘ tough grader’
Contrary to common wisdom, features such as “lots of homework” and “tough grader”, do not have material impact to ratings. What is the implication of this? That really got me think hard. After interviewing with multiple students, I noted that, when signing up for classes, most of students not necessarily sign up for easy A. Most of them still would like to learn and try the best in class. Professors that make biggest difference are professors who care, willing to help and can inspire.
Recommended Next Steps:
- Increase coverage of data. Include data from small size colleges to large size universities, and from liberal arts colleges to research focused universities, and etc.
- Leverage Natural Language Processing (NLP) to further extract key information hidden in review comments.
- Once model is fine tuned, it can be generalized and apply to other reviewing websites, such as Amazon, Yelp, and etc. This will help uncover key highlights and pain points from customers. This will not only enhance self-awareness of companies, but also drive quality improvement.