Computer Science Department
School of Computer Science, Carnegie Mellon University
Large Scale Data Analytics of User Behavior for Improving Content Delivery
The Internet is fast becoming the de facto content delivery network of the world, supplanting TV and physical media as the primary method of distributing larger files to ever-increasing numbers of users at the fastest possible speeds. Recent trends have, however, posed challenges to various players in the Internet content delivery ecosystem. These trends include exponentially increasing traffic volume, increasing user expectation for quality of content delivery, and the ubiquity and rise of mobile traffic.
For example, exponentially increasing traffic–primarily caused by the popularity of Internet video—is stressing the existing Content Delivery Network (CDN) infrastructures. Similarly, content providers want to improve user experience to match the increasing user expectation in order to retain users and sustain their advertisement-based and subscription-based revenue models. Finally, although mobile traffic is increasing, cellular networks are not as well designed as their wireline counterparts, causing poorer quality of experience for mobile users. These challenges are faced by content providers, CDNs and network operators everywhere and they seek to design and manage their networks better to improve content delivery and provide better quality of experience.
This thesis identifies a new opportunity to tackle these challenges with the help of big data analytics. We show that large-scale analytics on user behavior data can be used to inform the design of different aspects of the content delivery systems. Specifically, we show that insights from large-scale analytics can lead to better resource provisioning to augment the existing CDN infrastructure and tackle increasing traffic. Further, we build predictive models using machine learning techniques to understand users' expectations for quality. These models can be used to improve users' quality of experience. Similarly, we show that even mobile network operators who do not have access to client-side or server-side logs on user access patterns can use large-scale data analytics techniques to extract user behavior from network traces and build machine learning models that help configure the network better for improved content delivery.