Analysis of Per Pupil Spending as Reported by LAUSD

To see raw code, please visit my project repository on GitLab, or click here!

Special Election, Measure EE

For those of you outside of California, the school system for the city of Los Angeles has a reputation of both being underfunded and also overfunded. Many people argue that we should be spending more per pupil while others cite gross incompetence of the management. A couple months ago, the teachers for LAUSD went on strike demanding more funding and smaller class sizes and management folded after a short while. Eventually LAUSD officials supposedly asked for money from the city’s emergency fund to cover the new expenditures that were agreed to during the strike and the city declined. Measure EE was created to add $6B in funding over 12 years to the school system by taxing property at various rates.

This led to some angry rhetoric on both sides; with figures being thrown around with seemingly no source. I would read through what I could to better understand the situation as I did not read the measure or follow it closely. Eventually I came across the following graph which sort of horrified me:

Figure 1

Not to pick sides, but I tried to explain to them why their analysis is inherently flawed for countless reasons. I wanted to then see if I could replicate it and perform proper analysis.

Finding the Data

California actually supplies a lot of data for various departments online and is quite accessible. The issue comes in finding the exact data that the person in question used. I was ultimately unsuccessful, however I did find a nice repository of similiar information here: LAUSD Open Data.

The data sets that I used can be found in the GitLab repository that is linked at the top of this post.

The Analysis

After reading in the data and pairing it using the Funds.Center variable. A Cost Per Pupil was calculated using the cohort size divided by the total aggregated cost for the location. Naturally this isn’t really correct as the cohort likely only represents the population that is eligible to retire, it is simple enough for now until we can find more robust data. We were able to produce our first plot:

Plot 1

Yikes! That doesn’t look too good. Clearly something is wrong here, and in our quest for answers we can try to get rid of some of these problem locations by identifying our outliers.


> head(plot_dat[order(plot_dat$CostPerPupil,decreasing = T),])
   CostCenterCode LocationCode      CDSCode                                                       School
9         1194901         1949 1.964734e+13                                               Marlton School
6         1194101         1941 1.964734e+13               Benjamin Banneker Career and Transition Center
8         1194801         1948 1.964734e+13                Charles Leroy Lowman Special Education Center
2         1191001         1910 1.964733e+13                  Joaquin Miller Career and Transition Center
3         1191401         1914 1.964733e+13 Joseph Pomeroy Widney Career Preparatory & Transition Center
12        1195701         1957 1.964734e+13                 Ernest P Willenberg Special Education Center
   SchoolYear NumberInCohort NumberOfGraduates CohortGradRate Total.Cost CostPerPupil
9     2017-18             11                 5          0.455    1454765    132251.36
6     2017-18             16                 0          0.000    1388089     86755.56
8     2017-18             15                 0          0.000    1249432     83295.47
2     2017-18             35                 0          0.000    1752248     50064.23
3     2017-18             40                 0          0.000    1914170     47854.25
12    2017-18             35                 0          0.000    1413495     40385.57

Interesting! As it turns out, the Marlton School is a special school in LAUSD that serves deaf and hard of hearing students. I’m not sure about the remaining schools on the list, but I would imagine that the need for more money makes sense. Using some R magic, we’ll remove all of our outliers and plot again:

Plot 2

It looks like there isn’t much of a relationship here at all. Although we could probably guess, let’s see what we get for a correlation coefficient:

> cor(plot_dat2$CostPerPupil,plot_dat2$CohortGradRate)
[1] -0.1283272


So it doesn’t look like there isn’t any sort of relationship that exists between spending and graduation rates. However, we were able to do our analysis in a reproducible manner and is not some sort of excel beast. It’s not a matter of which way you lean personally on the matter, but incorrect analysis is just as dangerous as good analysis is insightful – even though we get the same result in this instance.

In the future I’d like to take a look at why there is seemingly two clusters. Stay tuned!

Full R Code Here

# Set Working Directory
# Load Graduation Data
graduation_dat = read.csv("Data/CohortGradRates20172018.csv",header=T)
# Load Fiscal Data
fiscal_dat = read.csv("Data/Budget Transparency Detailed Report FN0 2019.csv",header=T)

# Explore Which Columns we need

# We'll use "Funds.Center" to help link our fiscal data to our graduation data which shares the same codes. First we will need to aggregate all the fiscal data for each school.

fiscal_dat_agg = aggregate(as.numeric(fiscal_dat$Total.Cost), by=list(Category=fiscal_dat$Funds.Center),FUN=sum)

# Rename our columns for clarity
names(fiscal_dat_agg) = c("CostCenterCode","Total.Cost")

# Merge our DFs
plot_dat = merge(graduation_dat,fiscal_dat_agg,by="CostCenterCode")

# Create Cost Per Pupil
plot_dat = cbind(plot_dat,plot_dat$Total.Cost/plot_dat$NumberInCohort)
names(plot_dat)[10] = "CostPerPupil"

# Plot our information
plot(plot_dat$CostPerPupil,plot_dat$CohortGradRate, pch=20)

# what are those crazy high values?
head(plot_dat[order(plot_dat$CostPerPupil,decreasing = T),])

# Let's get rid of them for now and discuss them later
bp_costperpupil = boxplot(plot_dat$CostPerPupil)
plot_dat2 = plot_dat[-which(plot_dat$CostPerPupil>bp_costperpupil$stats[5]),]

# Plotting again...
plot(plot_dat2$CostPerPupil,plot_dat2$CohortGradRate, pch=20,
     main = "Cost Per Pupil (Derived) verus Cohort Graduation Rate (Reported)",
     ylab = "Graduation Rate (%)", xlab = "Cost Per Pupil ($USD)"

# Doesn't look good, but what is our Correlation Coefficient