Author: Super Admin

Developer Salaries 0

Playing with R Examples on KDNuggets – Developer Salaries

I had to play around in R Studio a bit when I saw some example R scripts analyzing results of an 88,883 user survey of software developers in 2019.

First, I started by simply downloading their data file which consisted of a zip file containing a README, some csv files, a pdf, and not much else.

Looking at the sample scripts, I could see quickly that pulling data into the program was about as easy as it was for Python or a Jupyter Notebook or Jupyter Lab.  If you would like to follow along, I would encourage you to visit their website and web page at

kdnuggets.

The author of this article where the original code I reference is By Tomaž Weiss, Data Scientist.

Immediately, upon seeing the first example, I had to try it and then modify it a little.  That’s not a bad comment on his R coding skills. It’s a good comment. It means his program was very informative, very instructional, and very interesting for getting a completely new person interested in R.

Here is how it looked before I modified it:

library(tidyverse)
library(countrycode)

# data import -------------------------------------------------------------

data <- read_csv("data/survey_results_public.csv")

# data preparation -----------------------------------------------------------

data_r <-    
   data %>% 
   filter(grepl('^R$', LanguageWorkedWith) | grepl(';R$', LanguageWorkedWith) | 
            grepl(';R;', LanguageWorkedWith) | grepl('^R;', LanguageWorkedWith)) %>% 
   filter(MainBranch %in% c('I am a developer by profession', 
                            'I am not primarily a developer, but I write code sometimes as part of my work')) %>% 
   filter(Employment %in% c('Employed full-time', 
                            'Employed part-time', 
                            'Independent contractor, freelancer, or self-employed')) %>% 
   filter(!grepl('Other Country', Country)) %>% 
   filter(!is.na(Country), !is.na(ConvertedComp), ConvertedComp > 0) 

The first thing I noticed was that the R language was hardcoded into the initial filter.  Couldn’t we put “R” into a variable that we can change at will and use a variable, say, “lang” for “language” instead?  Then we could change lang to Python or Scala or Objective-C or any other language.

data preparation -----------------------------------------------------------
 lang <- 'C++'
 data_r <-    data %>% 
   filter(grepl(paste0('^',lang,'$'), LanguageWorkedWith) |
            grepl(paste0(';',lang,'$'), LanguageWorkedWith) | 
            grepl(paste0(';',lang,';'), LanguageWorkedWith) |
            grepl(paste0('^',lang,';'), LanguageWorkedWith)) %>% 
 Later in another code snippit, "R" was again hardcoded into the title of this graph.  But by editing the ggtitle a little, the title in the graph could also be changed.
boxplot visualization ---------------------------------------------------
 data_r %>% 
   left_join(countries, by = "Country") %>% 
   inner_join(country_n_r_users %>% filter(n_r_users >= 5), by = "Country") %>% 
   mutate(Country = reorder(Country, ConvertedComp, median)) %>% 
   ggplot(aes(x = Country, y = ConvertedComp, fill = continent)) + 
   geom_boxplot(outlier.size = 0.5) +
   ylab('Annual USD salary') +
   coord_flip(ylim = c(0, 200000)) +
   scale_y_continuous(breaks = seq(0, 200000, by = 20000), 
                      labels = function(x) format(x, big.mark = ",", decimal.mark = '.', scientific = FALSE)
   ) +
   ggtitle(paste0("Distributions of ",lang," Users Salaries by Country")) +
   theme(plot.title = element_text(hjust = 0.5)) +
   scale_fill_discrete(name = "Continent")

Configuring versus Hardcoding

You can see something that most experienced programmers learn early in their career: a principle called DRY or Don’t Repeat Yourself.

New programmers hardcode the same information throughout their programs. It is questionable whether it is even wise to put data into programs at all. Usually, both program data and configuration data belong in configuration files or databases–not hardcoded into program files.

Let me give you an example programming in C.

#include <stdio.h>

/* here's a good example of how a beginner program writes code. */
main() {
    printf("I love dogs\n");
    printf("I love cats\n");
    printf("I love fish\n");
    printf("I love horses\n");
}

To the beginning programmer, this looks awesome. He compiles it and it runs! YES! Celebrate!

Ok. Now he wants to add goats. Easy. Copy the line about horses, change horses to goats, save it, recompile, and it runs!

After a while, he has 100 of these lines for 100 animals, and he decides he hates all animals. Or he wants this to be about someone else and not himself. Perhaps he wants to change “I” to “My wife” and “love” to “hates”.

So, he goes into the editor and does a search and replace from “I” to “My wife”. Elsewhere in the program, he has a variable named “Info”. It becomes “MyWifenfo”. Elsewhere, there’s mention of “Information Systems”. It becomes “MyWifenformation Systems”. He tries to undo his errors and undoes two out of seven of them. The other five out of seven are scrambled worse.

It sounds funny until you experience it. This poor programmer tries to abandon his changes but hits save accidentally, or his editor is on auto-save. The old copy is gone. Or on backups. Or it would be if the program were not so new.

He starts over.

/* First Step: Dividing the program into functions */
/* Perhaps excessively */

main() {
/*    dog(); cat(); -- commented out */
    dog2(); cat2();
}

dog() {
    printf("I love dogs\n");
}

cat() {
    printf("I love cats\n");
}
fish() {
    printf("I love fish\n");
}

/* These are for "My Wife Hates" */
dog2() {
    printf('My wife hates dogs');
}

cat2() {
    printf('My wife hates cats');
}

Hey. We tried it and it works! (Famous last words of many new programmers).

It still is not very flexible. It’s nice you can pick and choose But the program becomes unwieldy for handling likes and hates of hundreds of animals for hundreds or thousands of people.

include <stdio.h>
 /* Second Step:  Applying DRY, reducing the number of functions by making them more general, but without going overboard.
  */
 void loves(char person[], char animal[]) {
     printf("%s loves %s\n",person,animal);
 }
 void hates(char person[], char animal[]) {
     printf("%s hates %s\n",person,animal);
 }
 void main() {
      loves("My wife", "dogs"); hates("My wife","cats");
      loves("I", "dogs"); hates("I","cats");
  }

Ouch. The first two lines look good, but what about the last two?

$ ./a.out
 My wife loves dogs
 My wife hates cats
 I loves dogs
 I hates cats
$

That problem is easily fixed. Just make loves() and hates() into one function and add a third parameter for “love”, “loves”, “hate”, “hates”, “grooms”, “feeds”, and so on.

It would take several pages like this to cover the problems with this simple program. I almost called it a useless program, but it is not useless. Its purpose is to illustrate the value of some programming practices. And it illustrates a way for beginners to improve their coding.

One thing you will notice right away is that there is still data in the code and when you change what you want to do, you have to edit and recompile before you can run the program again. There is also a problem with having output functions like printf disbursed throughout the program. It is better to create functions that produce return values so those values can be processed into something that can be set up to produce output for a webpage, a printer, a widget, etc.

For those interested in software engineering, read up on best practices for the language you are using. However, you will find that best practices can be applied to any or almost any programming language.