This uses webscraping and machine learning to automatically find past exams/practice materials and classify them by class, year and type.
As the webmaster and VP Communications for the Computer Science Student Society I spent a lot of time updating the site this past year. I designed a web app from the ground up to replace the old “database”. The old database was merely a bunch of Drupal pages with manually uploaded files. During my time as webmaster, I received exactly zero of these.
Check out the new exam database! Reddit Post
I built this using Go. All of the exams are found using a custom webscraper. It can scrape all of the UBC CS websites, as well as Piazza and the files available on the department undergrad servers.
On a side note: While scraping I found numerous unsecured files including hundreds of completed student exams with names, student numbers and grades. These were responsibly disclosed to the UBC CS department.
Once I had the files, I built an interface for viewing files to classify them by year, term, class, type, and sample. Once I had enough initial data, I fed this into the Google Prediction API. I initially was working with using a custom classifier, but it performed significantly (10%+) worse than the out of the box solution. This was then used to classify the remaining files for display on the website.
Since the new site was fully interactive, I was also able to add an upload form for students to upload exams that aren’t publicly available.
Here’s some screenshots for your viewing.