Skip to content

Open Source Asset Pricing

Menu
  • Data
  • Code
  • Featured in
  • FAQ
Menu

July 2020 Data Release

Fixes

  • Previously, quarterly Compustat data was only lagged one month instead of 3. This lag is intended to account for the delayed release of accounting data. We also revised Cash and EarnIncrease to be closer to the original papers as these two predictors were affected by this revision. Incidentally, this seems to be the same bug that Robert Novy-Marx found affected earlier versions of the Hou, Xue, and Zhang (previously L. Chen and Zhang) 4-factor model. This fix makes the t-stats for NumEarnIncrease, EarnIncrease, and EarnSupBig decline by 6.0, 1.6, and 1.5, respectively, but has little effect on our main results.
  • Previously, the method we were using for merging datasets would drop permnos if multiple permnos were assigned to the same ticker. This dropped only 301 permnos out of 27,000 in the full dataset. We now are more careful with our merge to make sure no permnos are dropped. This results in small changes to the t-stats of all predictors. The average change is -0.007, and the standard deviation of the changes is 0.078.
  • We are grateful to Yang Liu (Tsinghua Finance), who carefully examined our data and pointed out the quarterly Compustat error.

Data

  • 208 firm-level characteristics (1GB zipped csv)
    • These omit size and price characteristics, which can be downloaded from WRDS.
    • Warning: Matlab has trouble reading very large CSV files (H/T Chris Jones @USC).
  • Returns for 210 long-short portfolios
    • Link is fixed (Jan 27, 2021) thanks to Huichou Huang.

Additional “Test Asset” Portfolios

  • Caution: the code behind these portfolios is not in the repo and has not been checked carefully.
  • Returns for N portfolios for each predictor based on the original papers
    • For each predictor, we generate 5 portfolios if our benchmark is quintiles, 10 portfolios if our benchmark is deciles, 2 if binary, etc.
  • Returns for 10 portfolios for each non-binary predictor formed by decile sorts.
    • Portfolio weights based on the original papers (typically equal-weighted), all stock breakpoints
    • Value-weighted returns, all stock breakpoints
    • Value-weighted returns, NYSE breakpoints
    • Caution: some predictors are not well-behaved enough to produce deciles

Additional Data for Making the Extended Dataset (Not Recommended)

  • The extended data contains characteristics that were not shown to clearly predict returns in the original papers. Most of these come from Hou, Xue, and Zhang’s (2020) “Replicating Anomalies” paper.
  • This includes, for example, R&D / Sales. Chan, Lakonshok, and Sougiannis (2001) find that “there is little if any relation between R&D relative to sales and future returns.”
  • This also includes, for example, analyst coverage. Scherbina (2008) shows that change in analyst coverage predicts returns, based on the idea that a decline in coverage may be caused by bad news. Elgers, Lo, and Pfeiffer (2001) use analyst coverage to show their predictor’s power is related to information frictions. Neither paper examines the predictive power of analyst coverage nor argues that analyst coverage should predict returns.
  • For readers interested in characteristics that may or may not predict returns, we recommend randomly generating characteristics following Yan and Zheng (2017) or Chordia, Goyal, and Saretto (2020)
  • But if you must, here are 104 additional firm-level characteristics that may or may not predict returns (0.7 GB)
    • This omits the bid-ask spread from TAQ due to license restrictions. TAQ spreads can be produced using code from Andrews work with Mihail Velikov, which is based on Holden and Jacobsen’s code.
  • Here are 1,050 additional portfolios
    • These are mostly made by altering the rebalancing frequency of other portfolios.

Data downloads

To download data, head to the Data tab, left click to open in a new tab (Google Drive links, may need to click the download button)

Authors

Andrew Y. Chen
Tom Zimmermann

News

2022-03-30: Data update (March 2022) including

  • Portfolios updated to end of 2021
  • Two additional signals
  • 2×3 Fama-French factor style portfolios
  • Head to the data page and our repo for details

 

2021-04-22: 04/2021 data release, Code v1.1.0

  • Added daily portfolio returns
  • Additional portfolio implementations
  • Added methods for accessing data
  • Minor bug fixes

 

2021-03-19: March 2021 data release, Code v1.0.0

  • Improved / Fixed signals: EarningsStreak, DivSeason, UpRecomm, DownRecomm, and more!
  • New Signals: CoskewACX, AnalystRevision, FEPS, OrderBacklogChg
  • Removed a couple redundant signals
  • Major modularization update: code entirely rewritten
    • Each signal is constructed in its own file
    • Each signal has its own csv
  • Error handling, improved logging
The views expressed herein are those of the authors and do not necessarily reflect the position of the Board of Governors of the Federal Reserve or the Federal ReserveSystem.
This project has received support from the Deutsche Forschungsgemeinschaft (DFG) under Germany’s Excellence Strategy EXC2126/139083886.
© 2023 Open Source Asset Pricing | Powered by Minimalist Blog WordPress Theme