Goals:

  1. Data Ingestion
  2. Data Lake
  3. ETL Design
  4. Scalability
  5. AWS Cloud
  6. Reporting

Perishable Insights → Data loses value quickly over time

Timely Decision making requires new data in minutes

Data Generated vs. Data available for Analysis

YouTube top trending dataset → from Kaggle

Trending YouTube Video Statistics

There is updated version of the Kaggle directory as well:

YouTube Trending Video Dataset (updated daily)

On-premise data center vs On-Cloud data center

Create IAM user for AWS & create Access Key