Over the last few years, our use of the Cloud to store all sorts of digital files has grown by leaps and bounds. In 2012, the global data supply reached 2.8 trillion GB. By 2020, the volumes of data are projected to reach 40 zeta bytes per person.
This phenomenal growth is called the “Big Data storage problem.” Tackling that problem is the focus of research carried out by Ryerson University Computer Science PhD candidate Fatema Rashid. Specifically, Fatema is exploring “data depulication” as a way to ease the problem of Big Data storage by identifying and removing duplicate copies from cloud-based storage systems.
Privacy and security concerns
Fatema’s write-up of her recent experiments and findings involving video deduplication has been accepted as an application track paper – co-authored with her supervisors professor Ali Miri and Isaac Woungang – for the IEEE International Congress on Big Data 2015. One of the main concerns Fatema raises in the paper is that, although deduplication maximizes storage space and minimizes storage costs, it comes with serious uneasiness over data privacy and security. As she explains, “While users want to save costs by allowing cloud storage providers (CSPs) to deduplicate their data, they don’t want CSPs to compromise the privacy of their data.”
Proof of storage protocols
Building on earlier research involving data deduplication through the H.264 compression protocol, the work Fatema presents in her current paper addresses two storage-related protocols: a proof of retrieval (POR) and a proof of ownership (POW) protocol. One of the novel features of these protocols is that they can be run by both the users of a cloud storage service (to ensure their videos are stored securely) and by CSPs themselves (to authenticate a video’s “true owner”).
“Based on our results, we believe that our proposed scheme will ensure cloud-storage users that their video data are safe, secure and protected,” says Fatema. “That would allay many people’s fears over storing their videos in the cloud while supporting deduplication as a way to solve the Big Data problem.” In her future work, she hopes to modify the proposed POR and POW protocols to include support for dynamic data and third-party auditing.