Being a student in the DSI program means having to complete lots of homework. The most challenging homework, in my humble opinion, are the projects. Every two weeks we are assigned a project, which we have to later present to our instructors and classmates. My first project was terrifying, not because it was a complicated problem, but because I was not sure what to expect, and was sort of drawing a blank when trying to think of something as basic as a “problem statement”.
Here I was thinking that being a Data Scientist meant the problem was going to be given to me, with its specific questions that needed to be answered. It turns out I was quite wrong about that. I since learned that a Data Scientist’s job can be quite broad in the sense that a Data Scientist may have to do more than just analyze a dataset. A Data Scientist may find herself having to find the data, do additional research into the subject matter addressed by the data, and even figure out what the right questions are that should be answered. A company may simply say to a Data Scientist: “Here is the data. Please figure out how we can make more money.”
And so, with my new found revelation, I spent most of my project time on figuring out what my problem statement should be and what kind of questions would be most relevant to solving this problem. The second part that turned out to be challenging was not the coding part, as I expected. It was the Math! Data Scientists are not programmers. They have Statistical knowledge that allows them to look at data in a way that others cannot. They can notice things by just looking at Summary Statistics for a given dataset. I found this to be very interesting and scary at the same time, as coding is my stronger skills out of the two.
With the Math part somewhat figured out, I moved forward to coding, which involved data cleaning, exploratory analysis, and visualizations. I enjoyed this stage very much, even though it did not take as much time as the other parts of the project. Coding in Python is the fun part for me, and being able to visualize the data can lead to very interesting findings. After answering the questions relevant to the problem statement, I was able to create the needed slides for the presentation, which included answers to these questions backed up by their respective graphs. I learned a lot from my first project and from seeing my classmates’ projects as well, and I feel confident my future projects will improve significantly.