Skip to content

Questions about extremely large C-TAM imputed benefits in the CPS data #281

@martinholmer

Description

@martinholmer

@andersonfrailey and @Amy-Xu, Now that taxdata pull requests #178 (fix TANF values), #185 (use Medicare and Medicaid actuarial values) and #278 (ignore veterans benefits in the distribution of other benefits to filing units) have been merged over the past month or so, I've spent some time looking at filing units that have what seem to me to be extremely large benefits.

I've found CPS records to look at by using a two-step process. First, the Python script below is used to find RECID values for filing units that have large benefits in the cps.csv.gz file. Second, the non-zero variables in each of those records are produced using the csv_show.sh bash script, which is part of the Tax-Calculator repository.

One filing unit found in this way is shown in my recent comment on taxdata pull request #135. That filing unit has an imputed TANF benefit of about $136,000 even though the taxpayer and spouse have combined earnings of over one million dollars.

Looking at the filing units with large tanf_ben and vet_ben values raises a question about how the CPS filing units are constructed. Among those with extremely large tanf_ben and vet_ben values are two different groups of fifteen records, all of whom appear to have exactly the same demographics and earnings (but different unearned incomes) and exactly the same large benefit. What's going on here? Why do the CPS data include these nearly identical records? Why are there fifteen near replicates? Where are the fifteen near replicates created in the code?

But quite apart from the groups of fifteen nearly identical filing units, I don't understand how people with high incomes can be thought to be getting TANF benefits. Is that imputation being done in C-TAM code or in taxdata code?

The one filing unit (represented by fifteen near replicates) with a very large vet_ben value could plausibly be a retired three-star general with somewhere around 35 years of service as @feenberg suggested in C-TAM issue 73. The taxpayer is 57 years old and has a vet_ben value of $169,920. That amount includes our estimate of the actuarial value of access to the VA hospital system, which is about $9,890. So, the amount of what seems to be a pension for military service is roughly $160,000 per year.

But it seems to me that including military retirement pensions in vet_ben is incorrect because as taxable income they should be added to the e01700 variable.

And the fact that vet_ben seems to include largely military (retirement or disability) pensions and retiree medical benefits raises another question in my mind. Given that vet_ben are largely deferred compensation for those who served in the military, why would this kind of income ever be considered for repeal as part of a UBI reform? If they are thought to be "welfare" (rather than deferred compensation), why didn't the C-TAM project include the pension benefits and health insurance benefits accruing to retired federal (or state and local) government employees? If retired government employees were not a focus in the C-TAM work because they are getting not "welfare" but deferred compensation, then why was the deferred compensation of those with military service included in the scope of the C-TAM project?

Now the details. First, the Python script called bentab.py:

from __future__ import print_function
import numpy as np
import pandas as pd

data = pd.read_csv('cps.csv.gz')
print('num_filing_units:', data.shape[0])

def big_recids(big):
    rids = big['RECID'].tolist()
    print('   RECIDs:')
    for num in range(0, big.shape[0]):
        print('     ', rids[num])    

big = data[data['XTOT'] >= 14]
print('num_with_XTOT>=14:', big.shape[0])
big_recids(big)

big = data[data['ssi_ben'] >= 48000]
print('num_with_ssi>=$48K:', big.shape[0])
big_recids(big)

big = data[data['tanf_ben'] >= 120000]
print('num_with_tanf>=$120K:', big.shape[0])
big_recids(big)

big = data[data['vet_ben'] >= 156000]
print('num_with_vet>=$156K:', big.shape[0])
big_recids(big)

And now the output from that Python script:

taxdata/cps_data$ python bentab.py
num_filing_units: 456465
num_with_XTOT>=14: 3
   RECIDs:
      83778
      422382
      434766
num_with_ssi>=$48K: 2
   RECIDs:
      280113
      403676
num_with_tanf>=$120K: 20
   RECIDs:
      76509
      76510
      76511
      76512
      76513
      76514
      76515
      76516
      76517
      76518
      76519
      76520
      76521
      76522
      76523
      135454
      191624
      311549
      312738
      315578
num_with_vet>=$156K: 15
   RECIDs:
      119232
      119233
      119234
      119235
      119236
      119237
      119238
      119239
      119240
      119241
      119242
      119243
      119244
      119245
      119246

@MattHJensen @MaxGhenis

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions