I’m using this page to document my ongoing research on the notice and comment process in US federal regulations. Putting it up on Github is an experiment and a work in progress. I hope it will make my work accessible and transparent.
Every year federal regulators in the US receive thousands of public comments on proposed regulations, with content ranging from simple pleas to sophisticated technical and legal arguments. Submitters include thousands of companies, non-profit organizations, unions, and government entities, as well as millions of individual citizens who submit comments as part of coordinated campaigns. Under the Administrative Procedures Act, US federal agencies must read every comment and respond to the main points raised by commenters when the agencies publish final regulations.
Regulatory comments occasionally appear in the news, such as when the FCC received millions of comments on its Net Neutrality proposal (many of which turned out to be fake), or the EPA receives millions of comments on major regulations like the Clean Power Plan. But comments are submitted every day, largely unnoticed, potentially influencing a bewildering array of regulatory changes.
I’m interested in these comments for a number reasons:
Regulations.gov was launched in 2003, but most agencies didn’t start using it until about 2007. However, the early adopters (the EPA and federal departments) represent publish a large fraction of all proposed rules. By 2005, most proposals published in the Federal Register directed readers to comment on Regulations.gov. A small number of agencies have also digitized older comments, but this data is incomplete.
The number of comments posted on Regulations.gov has grown drastically over the last 10 years - almost 100-fold since 2005. Most of the growth is driven by simpler comments with no attachments and repetitive comments associated with campaigns.
A big part of this project is simply figuring out who the comment submitters are. It’s a complicated process that I’ll document in detail soon. The key thing to understand is that it is imprecise. The raw data is big and messy and it is impractical to review every comment manually. I use a custom machine learning algorithm to parse comment metadata and extract organization names. To get an idea of the accuracy, I manually annotated a random sample of 1000 random comments to use as a test set. At the moment acccuracy is about 90% and the algorithm finds about 260,000 unique names that look like valid organizations. The Venn diagram below shows the overlap between organizations that comment on Regulation.gov, federal lobbying clients, and Compustat North America companies. Note that linking names across datasets is also an imprecise process where I have had to use another custom machine learning algorithm.
The plot below shows the relationship between the total number of comments submitted and money spent on federal lobbyists for organizations that do both during the years 2007-2017. There is a strong positive correlation between submitting comments and hiring lobbyists.
Another important question we can examine with Compustat data is whether the choice to comment depends on the structure of a firm’s industry. Previous research by my advisors Matilde Bombardini and Francesco Trebbi has shown that firms in more competitive industries rely more on trade associations to coordinate lobbying. Do the same patterns appear in the choice to comment? Or do the different costs of commenting lead firms to comment on their own more often? At this point I don’t have data on trade associations, but we can still examine whether there are obvious differences between commenting in competitive and concentrated industries.
The first plot below shows the probability that a firm comments or hires a lobbyist as a function of the Herfindahl index of the 50 largest firms in that firm’s primary industry. Firms in concentrated industries (high Herfindahl index) are more likely to comment and hire lobbyists, but the probabilities are extremely similar.
One issue with the above plot is that we know industry concentration and size are correlated. The plots below separate concentration and size as independent axes, and show probability estimates as heatmaps in this 2-d space. It appears that the relationship between lobbying and concentration is driven mainly by firm size. The similarity between the two heatmaps is quite surprising to me. Among Compustat firms at least, the choice to comment and the choice to hire a lobbyist look almost identical.
As a researcher, one of the most interesting features of the notice and comment process is the fact that most regulations are published in multiple stages, allowing us to see how they developed over time.
Regulations are years in the making, and agencies usually publish multiple documents in the Federal Register for each regulatory change. In the simplest case, agencies publish exactly two documents for each regulatory change:
A Proposed Rule. This document outlines the agency’s regulatory intent, including what they plan to change about the regulatory environment, why they are making the change, and the legal basis for the agency’s authority to do so. Some proposed rules include options that agencies present for feedback. All proposals provide information about how to contact the agency to comment on the regulation, and (usually) a 90 or 120-day period during which the agency is formally accepting comments.
A Rule. This document describes the final form of the regulatory change and the date that the change is effective. Agencies are also required to discuss and respond to all the comments they have received (at least at the level of broad themes). Some rules request comments to aid the agency in designing the next iteration of the regulation.
This process can be complicated by additional steps like Comment Extensions, Advance Notices of Proposed Rulemaking, Interim-Final Rules, Direct-Final Rules, Corrections, Re-Prints, or a variety of Notices.
I call a sequence of documents that are all related to the same regulatory change a rulemaking stream. I call a stream complete if it contains both a Proposed Rule and a Rule, and I call a stream simple if it contains exactly one Proposed Rule, one Rule, and no further documents except Notices and Comment Extensions. For most of the analysis below I restrict the sample to simple streams.
Unfortunately there is no formal definition for how documents should be grouped together, and agencies are sloppy about documenting the relationships. I have had to develop my own algorithm for grouping documents into streams based on whatever identifiers are available. While most groupings are fairly unambiguous, many edge-cases and poorly documented relationships exist. In these cases my algorithm attempts to infer the most likely relationship.
The most important part of a Rule is a section of legal text that uses codified language and formatting to describes changes to the Code of Federal Regulations (CFR). CFR text is organized into hierarchical numbered paragraphs. The legal text in a Rule describes what paragraphs should be removed, added, or modified, and presents all the new and modified CFR text. In some sense, this is all a rule is: a collection of edits to the CFR with a date for when the changes take effect. Many Proposed Rules also include draft legal text.
At the same time, most of the text in Rules and Proposed Rules is not legal text. Agencies spend many pages summarizing the rule in plain language, discussing the motivation and legal foundation of the rule, responding to comments, and dealing with other procedural matters. I call this non-legal text collectively discussion text.
The plot below shows the distribution of the lengths of legal and discussion text for all simple streams between 1994 and 2017, measured by number of paragraphs (note that discussion paragraphs are typically longer than legal paragraphs). There is a broad range of document sizes and legal complexity. Some documents are very long.
When proposal present draft versions of legal text, it provides an opportunity to examine how the legal text changes between the Proposal and final Rule. I’ll present data based on two measures of change.
Legal paragraph growth: I define the growth in legal paragraphs as the change in the number of legal paragraphs divided by the average number of legal paragraphs in the two documents.
Legal paragraph Jaccard index: This measures the fraction of paragraphs that are modified between documents. It is defined the number of unique paragraphs that appear in both documents divided by the number of unique paragraphs that appear in either document. It is therefore a measure of similarity: A Jaccard index of 1 indicates that no paragraphs have been modified or dropped between documents, and a Jaccard index of 0 indicates complete replacement (or at least small changes to every paragraph).
When computing these measures, I restrict the sample of simple streams to those where the Proposed Rule has at least one legal paragraph. The measures are still defined for other cases, but the interpretation is much less clear and I would like to handle the case of proposals with no legal text separately.