← Back
11 sectors · IT
Capstone · AllianceBernstein

Market Signal Extraction

Lead analyst · Jan–May 2024

IT sector signal: ~13% backtested return

Problem

Investment teams read through enormous volumes of financial news, but turning that reading into something systematically useful is hard. The challenge isn't gathering the data. It's building signals you can actually validate against real market returns rather than just assuming they're informative.


What I Built

~68K S&P 500 financial news articles
  → LDA + BERTopic topic modeling (baseline comparison)
  → GPT-3.5 + LangChain prompt engineering (primary)
  → FinBERT sentiment classification
  → Sector-level signal aggregation (11 sectors)
  → Alphalens factor validation against real market returns

Results

The IT sector signal produced about 13% in backtested returns, with a clear relationship between topic-level sentiment and what the sector did afterward. AllianceBernstein's portfolio team received the full factor analysis, showing that LLM-extracted signals can generate statistically meaningful alpha when built and validated carefully.


Stack

PythonLDABERTopicGPT-3.5 APILangChainFinBERTAlphalensPandas
GitHub

Next Project

Gas Notification Pipeline