CSI_2024_DataEngineering

Internship on Data Engineering where below topics are applied skills that are used to complete the given tasks through out 8 weeks including the project.

Stars
0
Committers
2

Celebal Summer Internship - 2024 (MAY - JULY)

WEEK1 & WEEK2

Task Assigned

  • Problem Solving in Python on Hackerrank.
  • Total 10 + 10 Questions.
  • Saving file in local machine and uploading a .zip file.

WEEK3 & WEEK4

Task Assigned

  • Problem Solving in SQL on Hackerrank.
  • Total 10 + 10 Questions.
  • Saving file in local machine and uploading a .zip file.

WEEK5

Task Assigned

  • Advanced concept using existing SQL databases , Scheduling Triggers and using of Pipelines to automate the process.
  • Full task uploaded week5_task use the method to run the scheduler

WEEK6

Task Assigned

  • Concept on ADF - Azure Data Factory
  • Configuring FTP and creating incremental Pipeline to automate the given task.
  • Full task uploaded week6_task do the same explained in the DOC

WEEK7

Task Assigned

  • Explained task into the file , the task1 directs to load the file
  • Task2 is impossible to do without any subscription of Azure , that is also explained why throudh a doc file
  • Both python script & DOC file week7_tasks

WEEK8

Task Assigned

  • NYC Taxi Dataset Analysis
  • Loading dataset into DBFS , Flatten JSON fields , Writing flattened file as external parquet table
  • Full process week8_tasks

PROJECT