Assessing Local Control and Accountability Plans (LCAPs) Using Generative AI

This report uses generative AI to analyze thousands of LCAP goals and actions across California. It raises important questions about how local planning tools could become more measurable, strategic, and useful for improvement.

In addition to simplifying California public school funding calculations, the Local Control Funding Formula (LCFF) departed from a state-mandated accountability system wherein legislators dictated public schools’ expected goals and progress. Premised on Governor Jerry Brown’s adherence to the principle of subsidiarity (Bae and Stosich 2018; Wright 2017), LCFF designated local educational agencies (LEAs) as the primary budgetary decision-making units that must strategically plan future expenditures before receiving annual funds from the state. With input from community stakeholders, LEAs must write and publish a Local Control and Accountability Plan (LCAP) each academic year. In their LCAPs, LEA administrators identify goals, actions, services, and predicted expenditures aligned with statewide education priorities to improve outcomes for all students, and especially higher-need groups like economically disadvantaged students, English learners, and foster youth.

Serving simultaneously as a tool for education planning, budgeting, and accountability, LCAPs are information-dense documents that are intended to publicly record LEAs’ educational priorities and resource allocation decisions. However, several challenges have rendered the information contained in LCAPs minimally accessible for large-scale, systematic research. 

The primary barrier to systematic retrieval and analysis of LCAP data stems from the format in which the documents are published. LCAPs originate as templates provided to LEAs by the state as fillable tables and text fields in Microsoft Word format. LEAs complete these templates with considerable variation in length and formatting, and then publish the documents as PDF files on their organization’s webpage. Early attempts to programmatically extract LCAP data relied on optical character recognition (OCR) applied to PDF files, a method that works reasonably well for clean, typed documents but struggles with the inconsistencies introduced by district-level formatting choices, multi-column layouts, embedded tables, and scanned pages. Other text extraction tools, such as R’s pdftools package, are reasonably effective in recovering text from digitally produced (as opposed to scanned) pdf documents, but neither text extraction tools nor OCR are able encode document structure in a reliable way. As a result, analysts may be able to extract raw text from LCAPs, but are left with no indication of the relationships among the goals, actions, and expenditures that give LCAP data their greatest analytic value.

Some researchers and policy organizations have employed manual or “hand coding” approaches to LCAP data extraction, which can yield high-quality, reliable data tailored to specific research questions. However, hand coding is comparatively labor-, time-, and resource-intensive, all of which make the manual data extraction approach best suited to studies of relatively small LCAP samples as opposed to more comprehensive analyses of California’s public education landscape. 

Despite LCAPs’ significance as planning tools and accountability documents, the data challenges described above have led to a scenario in which we know relatively little about what California’s LEAs actually write in their LCAPs. Do districts in different geographic contexts or with different student populations articulate different goals and priorities? Do the goals set out by charter school LEAs look different from those described by traditional public school districts? How do the actions districts describe connect to their stated goals, and to what extent do districts with similar goals vary in the sorts of actions they propose? The necessary data for answering these sorts of questions exists within publicly available LCAPs, but extracting such information in usable form has been prohibitively difficult. 

The parsing tool described in the following section addresses this gap directly. By leveraging large language models to convert LCAP PDFs into structured JSON, the tool makes it possible to extract goal text, action descriptions, and associated metadata from large document collections, thereby unlocking a corpus of LCAP text suitable for statewide education policy analysis.