YouTube
YouTube reco system
1. Considers personal activity i.e. favorites, liked, videos watched, people subscribed to, videos watched till the end. 2. Build a co-visitation graph of expanding set of videos.
Issues to consider for YouTube
1. How to rank 2. What info is available about a video 3. Reco system 4. Distribution system 5. How to monetize the website.
YouTube info
1. In Dec 2016, ranked as 2nd most popular site by Alexa internet 2. Hulu, CBC also upload and share some of their content as a part of YouTube partnership program.
YouTube ranking factors
1. Metadata 2. HD > lower quality 3. # of views likes and shares.
Ranking in video search engines
1. Metadata and user prefs. 2. Upload date. 3. Duration. 4. User ratings. 5. Views.
What info YouTube collects about a video
1. Name 2. Description 3. Title 4. Tags 5. Thumbnails 6. Copyright information 7. Age restrictions.
Youtube as a content aggregator
1. Not a search engine in conventional sense 2. Only searches videos uploaded by users. 3. Doesn't return anything except for videos in search results. 4. Does not crawl the web for content.
YouTube upload flow
1. User clicks upload from desktop. 2. Video sent to San Jose data center 3. Video is uploaded to Server Breach located in Louisiana / Virginia 4. transcoded video copies sent to LA, Chicago, Ashburn, Washington DC
Details on ContentID
1. User uploads video 2. Transcoded to different formats 3. Spectrogram used for hashing audio 4. Sample sections of video are hashed 5. Matched against audio and video hashes of content submitted by copyright owners.
What does metadata info include for ranking by YouTube
1. Video titles 2. tags 3. descriptions. 4. Links to websites and social profiles.
Association Rule Mining
1. c(i, j) represents the co-visitation count for videos vi and vj 2. The relatedness of these two videos is defined by c(i,j) / f(i, j) 3. f(i, j) = c(i) * c(j) is a normalization funciton that takes into consideration the global popularity of individual videos.
Caveats anf Facts about YouTube
1.8 different video formats supported 2. 128Gig max video size 3. Default allowed duration 15 minutes but can be extended 4. Videos are never delted by YouTube.
YouTube creation
3 former PayPal employees in Feb 2005
YouTube's caching architecture
3-Tier Hierarchical Caching architecture. 38 primary caching locs 8 secondary locs 5 tertiary locs, spread over 5 contintents.
What is the % of video clicks accounted for by recommendation videos
60%
YouTube identification method for videos
64 bit, 11 character fixed length unique string per video.
What % video rich snippets are YouTube's
91%
Content Delivery Networks
A collection of content servers with the intelligence of being able to select a subset of servers based on a user's location and the content being requested.
Video search engine.
A search engine that crawls the web for video content.
CDN service providers
Akamai Limelight Level 3
What devices can play YouTube videos
Android devices Apple phones XBox, Playstation Apple TV, iPod touch
Why does the reco system need to be updated frequently?
Because enw videos are added very frequently and a video may go viral anytime and hence alter the statistics greatly.
Some video search engines
Bing Munax Science Stage
YouTube Monetization Copyright issue resolution
Copyright material identification system built by Google. Allows the original owner to either take down the content or let the content remain and allow ads to appear and the revenue is split two ways between YouTube and the owner.
What is ContentID
Database created by YouTube for keeping fingerprints of copyrighted content.
Indexing in video search engines
Done by acquiring metadata related to the video: 1. Quality 2. Author 3. Title 4. Creation date. 5. Tags 6. Description.
When does YouTube collect info about a video
During upload time
Vimeo
First to allow HD videos, focuses on short artsy videos.
Co-visitation relatedness graph
For a given seed video, we find top N ranked videos by the relatedness formula in association rule mining. Related videos form a directed graph.
What special thing had to be done to play YouTube videos on Apple devices
Had to be encoded in H2.264 which is Apple's preferred video format
Video streaming services
Hulu, Netflix, Amazon Prime.
Vevo
Joint venture b/w Universal group, Warner music group and Sony entertainment
Disecting rare videos study
Made a call for a rare video in Cali. It was initially served from Netherlands but eventual calls were served from Cali. So, videos are moved to where they are wanted.
Do all video search engines crawl the web for content?
No, some rely on videos being uploaded by users e.g. YouTube.
YouTube Search Engine stats
Processes more than 3 billion searches a moth which is more than Yahoo, AOL Ask combined. 4 billion videos a day 800 million unique users a month 70% traffic outside of US.
What does YouTube want to achieve with their CDN
Reduce RTT (round trip time)
What is spectrogram
Time frequency graph for audio files. Peak points are marked as a part of the hashing process for ContentID copyright matching.
Video hosting services
Vimeo Vevo Dailymotion YouTube
Video rich snippet
When someone searches for a query on google, a small video also pops up telling the user there is a video to help on the topic. Google is accused of being biased towards displaying YouTube videor rich snippets.
Is it legal to crawl YouTube?
Yes
In addition to the normal stuff for indexing in video search engines, what else is used
transcription and subtitles.