My research on the US federal notice-and-comment process

View the Project on GitHub bradhackinen/regcomments

Regulatory Comments

I’m using this page to document my ongoing research on the notice and comment process in US federal regulations. Putting it up on Github is an experiment and a work in progress. I hope it will make my work accessible and transparent.



Every year federal regulators in the US receive thousands of public comments on proposed regulations, with content ranging from simple pleas to sophisticated technical and legal arguments. Submitters include thousands of companies, non-profit organizations, unions, and government entities, as well as millions of individual citizens who submit comments as part of coordinated campaigns. Under the Administrative Procedures Act, US federal agencies must read every comment and respond to the main points raised by commenters when the agencies publish final regulations.

Regulatory comments occasionally appear in the news, such as when the FCC received millions of comments on its Net Neutrality proposal (many of which turned out to be fake), or the EPA receives millions of comments on major regulations like the Clean Power Plan. But comments are submitted every day, largely unnoticed, potentially influencing a bewildering array of regulatory changes.

I’m interested in these comments for a number reasons:

  1. Comments might be very important. In the US, a huge amount of public policy is developed and implemented in the form of regulations administered by government agencies (as opposed to being legislated directly by politicians). If comments have even a small influence on the final form of regulations then they have a big impact in total.
  2. Comments are cheap. Lobbying is expensive (interest groups spend billions of dollars on lobbyists in the US at the federal level alone), and the consensus in the economics literature is that lobbying has high fixed costs: it is expensive to develop the expertise and build the political connections. But submitting a comment is basically free. Does this mean that the notice-and-comment requirement of the Administrative Procedures Act is a form of lobbying subsidy? How does this affect who participates and what outcomes are realized?
  3. Comments have content. I’m not aware of any other type of lobbying data where you can see exactly what an interest group is saying to the government. Using comment content it might be possible to shed light on: the relative influence of different groups, competition and cooperation between interest groups, and what techniques interest groups use to persuade.
  4. Comments are unstudied. I’m not aware of any existing academic research that uses data from regulatory comments (please let me know if I’m missing someone’s work!). The low submission cost, and the fact that comments target government agencies rather than elected politicians suggest that they may work very differently from other types of lobbying.



Size of the dataset

Comment coverage was launched in 2003, but most agencies didn’t start using it until about 2007. However, the early adopters (the EPA and federal departments) represent publish a large fraction of all proposed rules. By 2005, most proposals published in the Federal Register directed readers to comment on A small number of agencies have also digitized older comments, but this data is incomplete.

Figure 1. Agencies using by year

Figure 2. Proposals linked to by year

The number of comments posted on has grown drastically over the last 10 years - almost 100-fold since 2005. Most of the growth is driven by simpler comments with no attachments and repetitive comments associated with campaigns.

Figure 3. Number of comments posted on by year


A big part of this project is simply figuring out who the comment submitters are. It’s a complicated process that I’ll document in detail soon. The key thing to understand is that it is imprecise. The raw data is big and messy and it is impractical to review every comment manually. I use a custom machine learning algorithm to parse comment metadata and extract organization names. To get an idea of the accuracy, I manually annotated a random sample of 1000 random comments to use as a test set. At the moment acccuracy is about 90% and the algorithm finds about 260,000 unique names that look like valid organizations. The Venn diagram below shows the overlap between organizations that comment on, federal lobbying clients, and Compustat North America companies. Note that linking names across datasets is also an imprecise process where I have had to use another custom machine learning algorithm.

Organization comment counts

Figure 4. Venn diagram of organizations in linked data

Comments and lobbying

Commenting and lobbying expenditures

The plot below shows the relationship between the total number of comments submitted and money spent on federal lobbyists for organizations that do both during the years 2007-2017. There is a strong positive correlation between submitting comments and hiring lobbyists.

Figure 5. Comments and Lobbying expenditures (All linked organizations, 2007-2017)

Figure 6. Number of agencies contacted by comment or lobbyist (Compustat North America Companies, 2007-2017)

Figure 7. Correlation between number of agencies contacted via comment and lobbyist (Compustat North America Companies, 2007-2017)

Figure 8. Number of firms contacting agencies by comment or lobbyist (Compustat North America Companies, 2007-2017)

Participation and industry concentration

Another important question we can examine with Compustat data is whether the choice to comment depends on the structure of a firm’s industry. Previous research by my advisors Matilde Bombardini and Francesco Trebbi has shown that firms in more competitive industries rely more on trade associations to coordinate lobbying. Do the same patterns appear in the choice to comment? Or do the different costs of commenting lead firms to comment on their own more often? At this point I don’t have data on trade associations, but we can still examine whether there are obvious differences between commenting in competitive and concentrated industries.

The first plot below shows the probability that a firm comments or hires a lobbyist as a function of the Herfindahl index of the 50 largest firms in that firm’s primary industry. Firms in concentrated industries (high Herfindahl index) are more likely to comment and hire lobbyists, but the probabilities are extremely similar.

Figure 9. Probability of commenting or hiring a federal Lobbyist (Compustat North America Companies, 2007-2017)

One issue with the above plot is that we know industry concentration and size are correlated. The plots below separate concentration and size as independent axes, and show probability estimates as heatmaps in this 2-d space. It appears that the relationship between lobbying and concentration is driven mainly by firm size. The similarity between the two heatmaps is quite surprising to me. Among Compustat firms at least, the choice to comment and the choice to hire a lobbyist look almost identical.

Figure 10. Probability of commenting or hiring a federal Lobbyist (Compustat North America Companies, 2007-2017)

Regulatory Change

As a researcher, one of the most interesting features of the notice and comment process is the fact that most regulations are published in multiple stages, allowing us to see how they developed over time.



Regulations are years in the making, and agencies usually publish multiple documents in the Federal Register for each regulatory change. In the simplest case, agencies publish exactly two documents for each regulatory change:

  1. A Proposed Rule. This document outlines the agency’s regulatory intent, including what they plan to change about the regulatory environment, why they are making the change, and the legal basis for the agency’s authority to do so. Some proposed rules include options that agencies present for feedback. All proposals provide information about how to contact the agency to comment on the regulation, and (usually) a 90 or 120-day period during which the agency is formally accepting comments.

  2. A Rule. This document describes the final form of the regulatory change and the date that the change is effective. Agencies are also required to discuss and respond to all the comments they have received (at least at the level of broad themes). Some rules request comments to aid the agency in designing the next iteration of the regulation.

This process can be complicated by additional steps like Comment Extensions, Advance Notices of Proposed Rulemaking, Interim-Final Rules, Direct-Final Rules, Corrections, Re-Prints, or a variety of Notices.

I call a sequence of documents that are all related to the same regulatory change a rulemaking stream. I call a stream complete if it contains both a Proposed Rule and a Rule, and I call a stream simple if it contains exactly one Proposed Rule, one Rule, and no further documents except Notices and Comment Extensions. For most of the analysis below I restrict the sample to simple streams.

Unfortunately there is no formal definition for how documents should be grouped together, and agencies are sloppy about documenting the relationships. I have had to develop my own algorithm for grouping documents into streams based on whatever identifiers are available. While most groupings are fairly unambiguous, many edge-cases and poorly documented relationships exist. In these cases my algorithm attempts to infer the most likely relationship.

Paragraph types

The most important part of a Rule is a section of legal text that uses codified language and formatting to describes changes to the Code of Federal Regulations (CFR). CFR text is organized into hierarchical numbered paragraphs. The legal text in a Rule describes what paragraphs should be removed, added, or modified, and presents all the new and modified CFR text. In some sense, this is all a rule is: a collection of edits to the CFR with a date for when the changes take effect. Many Proposed Rules also include draft legal text.

At the same time, most of the text in Rules and Proposed Rules is not legal text. Agencies spend many pages summarizing the rule in plain language, discussing the motivation and legal foundation of the rule, responding to comments, and dealing with other procedural matters. I call this non-legal text collectively discussion text.

The plot below shows the distribution of the lengths of legal and discussion text for all simple streams between 1994 and 2017, measured by number of paragraphs (note that discussion paragraphs are typically longer than legal paragraphs). There is a broad range of document sizes and legal complexity. Some documents are very long.

Measures of change

When proposal present draft versions of legal text, it provides an opportunity to examine how the legal text changes between the Proposal and final Rule. I’ll present data based on two measures of change.

When computing these measures, I restrict the sample of simple streams to those where the Proposed Rule has at least one legal paragraph. The measures are still defined for other cases, but the interpretation is much less clear and I would like to handle the case of proposals with no legal text separately.

Summary plots

Agency differences

Comments and regulatory change