Analyzing TV Data

Whether or not you like football, the Super Bowl is a spectacle. There’s a little something for everyone at your Super Bowl party. Drama in the form of blowouts, comebacks, and controversy for the sports fan. There are the ridiculously expensive ads, some hilarious, others gut-wrenching, thought-provoking, and weird. The half-time shows with the biggest musicians in the world, sometimes riding giant mechanical tigers or leaping from the roof of the stadium. It’s a show, baby. And in this notebook, we’re going to find out how some of the elements of this show interact with each other. After exploring and cleaning our data a little, we’re going to answer questions like:

  • What are the most extreme game outcomes?
  • How does the game affect television viewership?
  • How have viewership, TV ratings, and ad cost evolved over time?
  • Who are the most prolific musicians in terms of halftime show performances?

Mobile games A/B Testing with Cookie Cats

Cookie Cats is a hugely popular mobile puzzle game developed by Tactile Entertainment. It’s a classic “connect three” style puzzle game where the player must connect tiles of the same color in order to clear the board and win the level. It also features singing cats. We’re not kidding!

As players progress through the game they will encounter gates that force them to wait some time before they can progress or make an in-app purchase. In this project, we will analyze the result of an A/B test where the first gate in Cookie Cats was moved from level 30 to level 40. In particular, we will analyze the impact on player retention.

Movies ETL

Amazing Prime video was a platform for streaming movies and Tv shows on Amazing Prime. Amazing prime video team would like to develop an algorithm to predict which low budget movies being released will become popular so that they can buy the streaming rights at a bargain. There are two data sources: a scrape of wikipedia for all movies released since 1990 and the rating data from the movie lands website.

The task is to extract the data from the two sources, transform it into one clean dataset, and finally load that data set into SQL table

Investigating Netflix Movies and Guest Stars in The Office

Netflix! What started in 1997 as a DVD rental service has since exploded into the largest entertainment/media company by market capitalization, boasting over 200 million subscribers as of January 2021. Given the large number of movies and series available on the platform, it is a perfect opportunity to flex our data manipulation skills and dive into the entertainment industry. Our friend has also been brushing up on their Python skills and has taken a first crack at a CSV file containing Netflix data. For their first order of business, they have been performing some analyses, and they believe that the average duration of movies has been declining. As evidence of this, they have provided us with the following information. For the years from 2011 to 2020, the average movie durations are 103, 101, 99, 100, 100, 95, 95, 96, 93, and 90, respectively.

The task is to provide answers to the following questions

  • What does this trend look like over a longer period of time?
  • Is this explainable by something like the genre of entertainment?

Movie Recommendation From Reviews

Recommender systems are one of the popular and most adopted applications of machine learning. They are typically used to recommend entities to users and these entites can be anything like products, movies, services and so on.

We will be building a movie recommendation system here where based on data\metadata pertaining to different movies, we try and recommend similar movies of interest!

Classify Song Genres From Audio Data

Over the past few years, streaming services with huge catalogs have become the primary means through which most people listen to their favorite music. But at the same time, the sheer amount of music on offer can mean users might be a bit overwhelmed when trying to look for newer music that suits their tastes.

For this reason, streaming services have looked into means of categorizing music to allow for personalized recommendations. One method involves direct analysis of the raw audio information in a given song, scoring the raw data on a variety of metrics. Today, we’ll be examining data compiled by a research group known as The Echo Nest. Our goal is to look through this dataset and classify songs as being either ‘Hip-Hop’ or ‘Rock’ – all without listening to a single one ourselves. In doing so, we will learn how to clean our data, do some exploratory data visualization, and use feature reduction towards the goal of feeding our data through some simple machine learning algorithms, such as decision trees and logistic regression.