Three facts about Stackoverflow survey data



Introduction

In this short blog, I would like to draw your interest in the survey coming from Stackoverflow. You might wonder whether or not parents' education inspire their offsprings, especially in the area of IT. Also, is race a feature affecting an individual's income? Finally, with less features than the original model, will it be still significant enough to explain the features which have impact on developers' salary. Let's dig into this article to find down.

Part 1: The relationship between race and salary


Firstly, it can be seen that a large share of respondents fall into "White or of European descent". In the next step, I would like to call this category shortly "White" and try to see some charts of salary among "White" and the rest.
According to the bar plot, "white" developers have a higher average salary than others. However, to see the distribution among the two categories, let's have a further look at the histogram.
Unsurprisingly, while white developers have a shape of not-really-left-skewed histogram, "other" developers seem to leave a large proportion on the left side of the chart.
Therefore, it can be concluded that white developers earn more than others, at least in the population of Stackoverflow developers' respondents

Part 2: Does parents' education have an effect on their offspring's education
It is worth noting that in each bracket, the majority of developers follow their parents' education path. For instance, a very high bar in the Bachelor's degree bracket indicate that among developers who have Bachelor's degrees, most of them have their parents done that as well. The same thing happens in case of Master's degree, without Bachelor's degree, or even High school bracket

Part 3: Predicting salary
As the original model from Education team of Udacity, we can see that some features are among top 20. Therefore, let's see if we trim the fat, i.e. remove other features, whether or not the results is still significant. That is why in this part, I will train a model which keep only important features: "Country", "Currency", "YearsCodedJob", "Overpaid", "CompanyType", "YearsProgram".

Using Decision Tree regressor, the r2 score on the test set is approximately 0.54.
Furthermore, the feature importance of top 10 features are listed below:
As can be seen, the feature importance are quite the same as the original model.

Conclusion
In this article, from the data of Stackoverflow survey, it can be concluded that:
1, European and white descendants seem to have higher salary than others
2, Developers in the survey tend to follow their parents' education
3, The 'lite' model produces a similar result as the original model

So, if you want your children to study higher in the future, let's consider upskilling your degree.






Comments

Popular posts from this blog

Starbucks Project