Using Twitter Data to Test the Efficiency of Recommender Systems
By Connie Jeske Crane
Whether you’re looking for dinner options, resolving a music debate with a friend or on the hunt for a job, the Internet search is a quick and vital action most of us perform multiple times a day. More interestingly, as web-based systems have evolved they’ve become able to make actual recommendations -- about movies, books or music that we may like for example – based on our past searches.
While we’ve come to take it for granted, that with a few simple clicks we’ll have relevant information at our fingertips, behind the scenes, we know the process isn’t quite that simple. With vast and growing stores of available data, developing the best methods of filtering, prioritizing and personalizing search results is an ongoing challenge. This is where recommender systems come in. As Time magazine has defined them, “Recommendation engines are the software that suggests what we should watch or read or listen to next. They help us deal with the millions of choices the Web offers.”
Supervisor Dr. Abdolreza Abhari and Jason Li in their lab with results of their research
At Ryerson, Jason Li, currently completing his master’s in Computer Science, has devoted time to testing and seeking to increase the efficiency of such recommender systems. In a recent research project, he and his supervisor, Ryerson’s Dr. Abdolreza Abhari, simulated a recommender system in a rather unique way. Li explains, “My current research is testing the efficiency of systems that are implemented using the MapReduce paradigm. MapReduce paradigm is a popular method used to handle Big Data…In my research, we are simulating a recommender system that uses MapReduce to see the efficiency of the system as we increase the number of nodes.”
Li, who also completed his bachelor’s degree at Ryerson, says with this project researchers will be able to observe efficiency as the number of nodes are increased. Ultimately, he says, “From the efficiency test, we can determine what would be the optimal number of nodes for such massive data mining systems to complete their tasks.”
Graphical user interface of the simulator of the recommender system created by Jason
The other interesting twist to this project was Li and Abhari’s collection and use of real data from Twitter users. A user’s tweets can contain information that is similar to another user’s tweets and thus can be used by recommendation systems to suggest people with same interest. However collecting millions of real tweets is very time consuming. “In this research, we aim to provide stochastic [or randomly determined] tweets that can be used for testing recommender systems with a large data,” write the authors, adding that their research proposed a method for creating stochastic user tweets (after first collecting real public Twitter data) to aid in the simulation of a recommender system using millions of tweets. To the best of their knowledge, say Li and Abhari, no other work has tried to create stochastic data using Twitter messages.
Jason Li at the Nauticus that features the Battleship Wisconsin in Norfolk
Li says the project marked a few other firsts as well. This was the first time he travelled to a conference, namely the 2017 Spring Simulation Multi-Conference in Virginia Beach, Va. The paper he co-authored with Abhari, “Generating Stochastic Data to Simulate a Twitter User,” was also his first publication – and was featured at the 2017 Proceedings of the 20th Communications & Networking Simulation Symposium (CNS 2017) by The Society for Modeling and Simulation International (SCS).
Li, who says he enjoyed interacting with fellow students in Ryerson Computer Science program, has acquired a lot of knowledge in his field. “Last but not least,” he says, “I enjoyed the time I was working with my supervisor, Dr. Abdolreza Abhari.”