The Threshing Floor

The Threshing Floor is a desktop tool. Gathering data means opening a Reddit link and copying its text between browser tabs — impractical on a phone. You can read, browse saved collections, and export here, but to collect new data, come back on a laptop or desktop.

What is Thresh?

Every day, millions of people talk to each other on Reddit. They ask for help. They share what scares them. They argue about what matters. Most of it disappears into the scroll.

Thresh pulls it out.

Point it at any subreddit. Tell it what you're looking for. It hands you a clean dataset with a complete record of how it was collected. No accounts, no API keys, no code. Everything runs in your browser. Everything stays on your machine.

The threshing floor is among the oldest inventions in human history. High ground, flat stone, wind. The place where you brought the harvest and beat it until the grain fell free. Every civilization built one. Covenants were sealed on threshing floors. Altars were raised on them. Prophets used them to talk about judgment. They were never passive places. They were where you did the work of separation. The flood is information now, and the grain is what people are actually saying to each other when they think no one important is listening.

Version 2

Why the Method Changed

Version 1 of Thresh fetched Reddit data for you in the background, through a small relay server. It worked beautifully — until Reddit changed the rules. Over the past few years Reddit progressively locked down automated access to its data, and by 2026 it was actively blocking the data-center servers that tools like Thresh run on. The relay started coming back empty. Not because the data is private — it's still right there, public, in everyone's browser — but because Reddit no longer lets a server fetch it on your behalf.

So Version 2 moves the harvest back into your hands. Thresh builds the precise link to the data you want; you open it in your own browser — a real person on a real connection, which Reddit still serves — and paste what you see back onto the Floor. Everything after that is exactly as before: cleaning, word frequencies, AI analysis, provenance, export.

It's slower, and it's a desktop process. But it's honest, transparent, and unblockable — and it deepens the old metaphor. The threshing floor was always a place of manual labor. You carry the harvest in yourself. Read the fuller story →

Quick Start

Enter a subreddit name and start collecting data in seconds. No API keys, no code, no setup.

How It Works

1 Thresh — Set a query, open the link Thresh builds, paste the data back

2 Harvest — Browse and search your collected data

3 Winnow — Analyze with word frequency charts and optional AI

4 Glean — Export data with provenance, or generate an AI research report

Recent Collections

No collections yet. Start by threshing a subreddit.

Who Uses This — And How

Public Health Researcher

"What are people in r/mentalhealth talking about this month?"

Thresh: r/mentalhealth • Top • Past month • 100 posts Harvest: Sort by score to find what resonates most Winnow: Run Identify themes to map dominant concerns Glean: Export CSV with anonymized usernames for IRB-ready analysis

Strategy tip: Collect the same subreddit monthly and compare word frequency tables. Emerging terms (e.g., a new medication name, a policy change) show you what’s shifting in the conversation before it shows up in the literature.

Journalist

"What questions are people asking in r/personalfinance about student loans?"

Thresh: r/personalfinance • Top • Past week • keyword: "student loans" Harvest: Sort by num_comments for the biggest conversations Winnow: Run Extract questions to find what people need answered Glean: Provenance.txt gives your editor a transparent methodology section

Strategy tip: Try the same keyword across different subreddits (r/personalfinance, r/studentloans, r/povertyfinance). The same topic sounds different depending on who’s talking — that contrast is the story.

Graduate Student

"I need to compare discourse in r/science vs. r/conspiracy for my thesis."

Thresh: science, conspiracy • Top • Past year • keyword: "vaccine" Harvest: Compare upvote_ratio to see consensus vs. division Winnow: Run Sentiment analysis on each, then a Custom prompt comparing tone Glean: Two exports, each with its own provenance — cite both in your methods section

Strategy tip: Use the Academic report format on each collection separately, then use a Custom prompt to compare them side by side. You now have a draft comparative analysis with proper methodology documentation for your advisor.

Community Organizer

"What are residents saying in our city's subreddit about the new transit plan?"

Thresh: r/yourcity • New • Past month • keyword: "transit" Harvest: Enable comments to hear the full conversation, not just headlines Winnow: Run Summarize discussion to distill what people actually want Glean: JSON export feeds directly into your own tools or dashboards

Strategy tip: Collect once with Top sort (what resonates) and once with Controversial sort (what divides). The gap between those two collections is where the real debate lives — and where a town hall should focus.

Labor Market Economist

"What pain points and unmet needs are job seekers describing in r/jobs right now?"

Thresh: r/jobs • Top • Past month • 100 posts with comments Harvest: Sort by score to find which frustrations resonate most widely Winnow: Run Identify themes — salary transparency? ghosting? application fatigue? See what the data says Glean: Generate an AI Research Report framed for your academic audience. You now have a first draft of a labor market sentiment brief grounded in real worker voices

Strategy tip: Thresh the same subreddit once a month for three months and compare word frequencies over time. That’s a longitudinal snapshot of worker sentiment — impossible to get from BLS data alone.

Political Campaign Manager

"What issues are people in r/Denver fired up about before our town hall?"

Thresh: r/Denver • Hot • Past week • 100 posts with comments Harvest: Sort by num_comments to find what’s sparking the most debate Winnow: Run Extract questions to see what voters are actually asking. Then Identify themes to categorize by issue: housing, crime, transit, cost of living Glean: Generate a Town Hall Brief for your candidate. Export the data as a CSV backup for your comms team

Strategy tip: Collect from multiple city subreddits (r/Denver, r/DenverFood, r/DenverCirclejerk) to see how the same issues land in different community contexts. Each collection gets its own provenance seal.

You carry the harvest in by hand.

Reddit no longer lets tools fetch its data automatically — it blocks the servers they run on. So The Threshing Floor does it the honest way: it builds the exact link to the data you want, you open it in your own browser and copy what Reddit shows you, and you paste it back here. Thresh takes over from there — cleaning, organizing, and preparing it for analysis and export.

Three steps: set the query → gather from Reddit → lay it on the Floor. This is a desktop process — it needs two browser tabs and a copy-paste. Why the method changed →

Set the query

Tell Thresh what to gather

Subreddit

Enter the subreddit name without the r/ prefix. To gather several at once, separate them with commas — Thresh combines them into a single Reddit link (e.g. science, AskScience → one page).

Sort By

Top = highest-scored posts (best for research). Hot = trending now. New = most recent. Rising = gaining momentum. Controversial = most divided (low upvote_ratio).

Time Filter

Controls which posts are eligible. Past week is a good default. Use Past year or All time for broader studies. (Applies to Top & Controversial.)

Max Posts

Reddit shows up to 100 posts per page. Up to 100 is a single copy-paste. 250–500 means Thresh walks you through 3–5 pages, one paste each — it hands you the next link automatically after every paste.

Keyword Filter

Searches the subreddit for this term. Leave empty to gather everything matching your sort/time settings. (Keyword searches are a single page.)

Gather from Reddit

Open this link in Reddit

Here's the exact address for your data. Open it (it loads Reddit's raw data view), then select everything and copy it.

The link opens a page of raw data (JSON) in a new tab. It looks like a wall of text — that's correct.
Click anywhere on that page, then press Ctrl+A (⌘ Cmd+A on Mac) to select all of it.
Press Ctrl+C (⌘ Cmd+C) to copy.
Return to this tab and paste it into Step 3.

This is the same public data your browser already loads when you visit Reddit — you're just fetching it yourself instead of through a blocked server. Nothing is hidden, and there's nothing to log in to.

Lay it on the Floor

Paste the data — Thresh beats the grain free

Paste everything you copied. Thresh reads it, separates out the posts, and adds them to your collection. It never leaves your browser.

Building Bigger Datasets

Reddit shows up to 100 posts per page. For 250 or 500, Thresh walks you through the pages one paste at a time — after each paste it reads Reddit's own “next page” marker and hands you the following link. But the real power is systematic sampling:

Longitudinal snapshots: Gather 100 posts from the same subreddit once a week (or once a month) over several weeks. Each collection is timestamped and gets its own provenance record. Compare word frequencies across collections to track how the conversation evolves. Multi-sort sampling: Gather the same subreddit with different sort methods. Top gives you what resonated. New gives you what people are saying right now. Controversial gives you what divides the community. Three collections, three lenses on the same data. Cross-community comparison: Gather multiple related subreddits (e.g., r/jobs, r/careeradvice, r/recruitinghell) with the same keyword and time filter. The provenance records make each collection independently citable.

Each collection is saved separately in your browser. All exports include full provenance documentation, so every dataset is replicable regardless of when or how you gathered it.

Collection

No Data Yet

Collect some Reddit data first, then come back here to explore it.

Understanding Your Data — What Each Field Means

score — Net votes (upvotes minus downvotes). High score = community resonance. A post with 500 score was upvoted roughly 500 more times than it was downvoted.

upvote_ratio — Fraction of votes that were upvotes (0.0 to 1.0). A ratio of 0.95 means near-unanimous approval. Below 0.60 means the post is divisive — this is what "Controversial" sort finds.

num_comments — Total comments on the post. High comments + low score often means debate. High comments + high score means broad engagement.

created_utc — When the post was made (UTC timestamp). Exported as both Unix timestamp and ISO date for your analysis tools.

is_self — True if the post is text (a "self-post"). False if it's a link to an external site. Self-posts contain the author's own writing in the selftext field.

selftext — The body text of a self-post. This is the primary content for text analysis, sentiment, and theme extraction.

link_flair_text — Category label set by subreddit moderators (e.g., "Discussion", "News", "Vent"). Useful for filtering by post type in your analysis.

author — Reddit username. Anonymized by default in exports to protect privacy. Shown here for browsing context.

domain — Source domain for link posts (e.g., "nytimes.com") or "self.subreddit" for text posts. Useful for tracking which sources a community shares.

Title	Author	Score	Comments	Date

Nothing to Export

Collect some data first, then return here to export it.

Collection

Format

CSV opens in Excel, Google Sheets, SPSS, and R. JSON is better for Python, JavaScript, or feeding into other tools.

Anonymize usernames

Replaces usernames with user_a1b2c3 hashes. Recommended for any published or shared work. Your choice is documented in provenance.txt.

What's in the Export ZIP?

posts.csv (or .json) — One row per post. Fields: id, subreddit, title, author, selftext, score, upvote_ratio, num_comments, created_utc, created_date, url, permalink, is_self, flair, domain.

comments.csv (if collected) — One row per comment. Fields: id, post_id, author, body, score, created_utc, created_date, depth, parent_id.

provenance.txt — Complete methodology record: tool version, subreddit(s), sort method, time filter, keyword, limits, post/comment counts, timestamp, anonymization setting, data source, known limitations, and ethical use guidance. Cite this directly in your methods section.

Want more than raw data? Scroll down to the AI Research Report section to generate a complete Introduction / Methods / Results / Discussion document from your collection. It combines everything: your metadata, statistics, word frequencies, and post content into a structured report you can build on.

AI Research Report

Generate a complete, downloadable research report from your collection. Claude reads your data, computes patterns, and writes a structured Introduction / Methods / Results / Discussion report that you can use as a starting point for your own analysis, a briefing for your editor, or a draft methods section for your thesis.

How it works:

Answer two questions below so Claude can frame the report around your specific research interest.
Click "Generate Report" and Claude will aggregate your collection metadata, summary statistics, word frequencies, and post content into a structured report.
Download the result as a beautifully formatted Word document (.docx) ready for editing, or as Markdown. The report includes full provenance so anyone reading it knows exactly how the data was gathered.

What is your research question?

This frames the Introduction and Discussion. Be as specific as you can. A good research question names the community, the topic, and the timeframe.

Who is this report for?

Each audience gets a different document structure. Academic produces a formal IMRaD paper. Journalist produces a lede-first column. Organizer produces a town hall action brief. General produces a clear op-ed.

Brief context (optional)

Anything that helps Claude understand why you collected this data and what you plan to do with the findings.

No Data to Analyze Yet

Head over to Thresh to collect posts from a subreddit. Once you have a collection, come back here to explore word patterns and run AI analysis.

Built-in Analysis

These tools run entirely in your browser at no cost. Select a collection below and they update instantly:

Post Volume Over Time — A bar chart showing how many posts appeared on each day. Spikes mean something triggered a surge of conversation — a news event, a viral post, a policy change. Flat stretches mean steady background chatter. Gaps mean silence. Look for spikes first, then check what posts appeared on those dates in Harvest. Word Frequency Table — The top 20 most common words across all post titles and bodies, with stopwords removed. Copyable with one click. This is the literal vocabulary of the conversation — what people actually say when they talk about this topic. Sortable Data Table — On the Harvest page, sort by any column. Sort by score for resonance, num_comments for engagement, or date for recency. Summary Statistics — Post count, average score, average comments, and date range are calculated automatically for every collection.

Claude AI Analysis

Claude reads your collected posts and produces structured research analysis. It goes beyond counting words — it understands meaning, groups ideas, and identifies patterns a human analyst would find.

Identify Themes — Groups posts into thematic clusters and names each theme. Perfect for a first pass on unfamiliar data. Example output: "Theme 1: Access to Care (23 posts) — Users describe long wait times, insurance denials…" Sentiment Analysis — Classifies overall tone (positive, negative, neutral, mixed) and identifies emotional patterns. Useful for tracking how a community feels about a topic over time. Summarize Discussion — Produces a structured summary: main points, areas of agreement, areas of disagreement, and standout observations. Good for briefings and literature reviews. Extract Questions — Pulls out the questions people are asking, categorized by topic. Reveals information gaps and unmet needs — invaluable for journalists and service providers. Custom Prompt — Ask Claude anything about your data. Your prompt is combined with a summary of up to 50 posts.

How to use this page:

Pick a collection from the dropdown below. The chart and word table update immediately.
Read the post volume chart — look for spikes (something happened) and gaps (silence). This tells you when the conversation moved.
Scan the word frequency table — these are the 20 most common words. This tells you what people are saying. Click "Copy" to grab it.
Run Claude AI analysis — choose an analysis type and click "Analyze with Claude." AI analysis is built in and free to use.
Copy or download the results. Use "Copy to Clipboard" for quick pasting, or "Save as .txt/.md" to keep a file for your records.

Collection

Selecting a collection updates the post volume chart and word frequency table below.

Analysis Type

Post Volume Over Time

Runs in your browser — no AI needed

Shows how many posts were created on each day in your collection. Spikes reveal when a topic surged — a news event, a viral post, or a policy announcement. Flat lines mean steady background conversation. Gaps mean the community went quiet. Use this alongside word frequency to see not just what people are saying but when they started saying it.

Select a collection above to see posting patterns.

Word Frequency

Instant, runs in your browser

Top 20 words from all post titles and bodies, with common stopwords removed. This is the literal vocabulary of the conversation — the words people actually use when talking about this topic.

Select a collection above to see word frequencies.

A Letter from the Builder

I built this because I know what it feels like to drown.

Not in water. In data.

I spent years writing a dissertation on social media discourse during the COVID-19 pandemic. Collecting tweets. Analyzing what people were saying to each other while the world fell apart. And what I found was genuinely illuminating. The patterns were there. The fear, the misinformation, the solidarity, the grief, the stubborn and sometimes beautiful ways people tried to make sense of something that didn't make sense. It was some of the most meaningful work I've ever done.

But the process of getting there was brutal. Even for me. Even with training and tools and institutional support. It required programming. It required API credentials. It required understanding rate limits and data schemas and pagination logic and a dozen other things that had nothing to do with the actual research question I was trying to answer.

And I kept thinking: if this is this hard for me, it is impossible for most people. The teachers who sense something shifting in how their students talk about mental health online. The journalists who can feel a story forming in a subreddit but can't prove it yet. The community organizers. The grad students. They're locked out. Not because they lack the curiosity or the intellect, but because the tools weren't built for them.

So I built this.

It can do much of what my dissertation did. Smaller scale, because of Reddit's rate limits. But that's fine. Small is good, if pointed in the right place, focused with the right lens. A hundred posts from the right subreddit in the right week will tell you more about how a community thinks than a thousand posts scraped without intention.

The world right now is full of quantitative social science. AI has made it easy to count things, to run regressions, to produce charts at speed. I'm grateful for that. But this project aims at something different. It aims at breathing life back into a research method I hold dear: qualitative inquiry. The close reading. The careful listening. Not because I'm an expert in it. Exactly the opposite, really. I chose the quantitative path early in my training, and I have admired qualitative work from afar ever since, with the quiet reverence of someone who knows they took the other road. There is something mysterious in it. Something irreducible. The act of sitting with people's words and asking what do they mean rather than how many are there.

At the deepest core, my hope is that Thresh marries the two again. That it gives you the structure to collect carefully, the numbers to orient yourself, and then the space to read. Really read. What people are saying. That it empowers anyone interested in understanding public discourse to gather data more easily, more ethically, and more transparently than the old tools ever allowed.

The ultimate recursion is this: what was once my dissertation is now everybody's tool.

The waters rise. The feed is a flood. But the threshing floor is high ground, and the work of separation is ancient, and the grain is worth the labor.

With hope,

Jacob E. Thomas, MA, PhD

February 2026

Postscript — June 2026

Since I wrote the above, Reddit closed the door that Version 1 walked through. It now blocks the servers that tools like this run on. I could have fought it with tricks — rotating addresses, disguising the traffic — but that's the opposite of what this tool stands for. So I rebuilt it to ask you for help instead. In Version 2 you open the link yourself and carry the data back by hand. It's more work. It's also more honest, and more transparent, and — if I'm honest — truer to what a threshing floor always was: a place you walk to, carrying what you grew.

Version 2

Why the Method Changed

For most of Reddit's history, every page quietly offered a machine-readable twin. Add .json to almost any Reddit address and you'd get the same content as structured data — the same posts, the same comments, just in a form a program could read. It was public, unauthenticated, and open to anyone, including a browser or a small relay server. Version 1 of The Threshing Floor was built on that openness: you typed a subreddit, and a lightweight edge proxy fetched the JSON and handed it to your browser.

That openness narrowed. Beginning with the API pricing upheaval of 2023 and tightening steadily after, Reddit moved to restrict automated access to its data. By 2026 the enforcement had reached the data-center IP ranges that cloud platforms — including the one Thresh's proxy ran on — use. Requests from those addresses increasingly came back throttled, blocked, or empty. The proxy still worked in principle; Reddit simply stopped answering it.

Crucially, the data never became private. Anyone can still open those exact pages in a normal browser, on a normal home connection, and read them — logged in or not. What changed is who Reddit will serve: a person, yes; an anonymous server fetching on that person's behalf, no longer reliably.

Version 2 takes that constraint at face value. Instead of fetching for you, Thresh tells you exactly what to fetch:

You set your query, and Thresh builds the precise Reddit .json link — nothing hidden, the whole URL visible.
You open it in your own browser. Reddit serves you the data, because you are a real person on a real connection.
You copy what you see and paste it back onto the Floor. From there, everything is unchanged: parsing, word frequency, AI analysis, provenance, and export all work exactly as before.

This is, if anything, more transparent than before. In Version 1 the fetch happened on a server you couldn't see. In Version 2 you watch every request leave and every byte arrive. You are the proxy now — the human in the loop — and the provenance record reflects that: you collected this data, first-hand.

The trade-offs are real and worth stating plainly. It is slower. Large collections take several copy-pastes instead of one click. And because it depends on copying text between browser tabs, it is a desktop tool — phones make the select-all-and-copy step painful enough that we don't pretend it works well there. We think that's an honest price for a method that can't be quietly blocked, that hides nothing, and that keeps Thresh working without resorting to the kind of evasion this project exists to oppose.

The waters rose; the old bridge washed out. So we walk across. The floor is still high ground, and the grain is still worth the labor.

What Is This?

The Threshing Floor is a free, open-source tool for collecting and exporting Reddit data. It is designed for public health researchers, journalists, civic technologists, and anyone who believes public discourse is worth measuring.

It runs entirely in your browser. There is no server, no database, no account to create. Your data stays on your machine.

How It Works

Reddit's public pages serve the same content as machine-readable data when you add .json to the address. The Threshing Floor builds the precise link to the data you want; you open it in your own browser and paste what Reddit shows you back onto the Floor. From there, Thresh lets you browse, filter, analyze, and export it — all in your browser. (See Why the Method Changed above for the full story.)

No Reddit API key is required. No authentication of any kind. The data collected is limited to exactly what you can see by visiting Reddit in a web browser — because that is literally how it is gathered.

The workflow follows four steps. Each one builds on the last:

Thresh (Collect) — Set a subreddit, sort method, time filter, post limit, and optional keyword. Thresh builds a Reddit link; you open it, copy the data, and paste it back. For more than 100 posts, Thresh hands you each page's link in turn. Comments can be added later, per post, on the Harvest page. Everything is stored in your browser.
Harvest (Browse) — Explore your collected data in a sortable, searchable table. Click any post to read its full text and comments. Sort by score to see what resonates, by comments to see what sparks debate, or by date to follow the conversation over time.
Winnow (Analyze) — Start with the built-in word frequency table, then use Claude AI to identify themes, extract sentiment, summarize discussions, or ask custom questions about your data. AI analysis is built in and free to use.
Glean (Export) — Download your data as CSV or JSON, bundled in a ZIP with a provenance.txt file that documents exactly how it was collected. Or generate an AI Report: answer two questions about your research, choose your audience, and Claude produces a document tailored to your needs — an IMRaD paper for academics, a data-driven column for journalists, a town hall brief for organizers, or an op-ed for general audiences.

The AI Report Generator

The Glean page includes an AI Report Generator powered by Claude Opus 4.6, Anthropic's most capable model. You provide a research question, choose your audience, and optionally add context. Claude aggregates your collection metadata, summary statistics, word frequency analysis, and the content of up to 50 posts into a complete document.

Each audience gets a different document format:

Academic — Formal IMRaD paper (Introduction, Methods, Results, Discussion). Hedged language, rigorous limitations, replication-ready methodology. For theses, journal submissions, and IRB documentation.
Journalist — A data-driven column. Lede-first structure, compelling narrative, quotes from the data, and a methods note for transparency. For newsroom briefings and story pitches.
Community Organizer — A town hall brief. Key findings as talking points, community voices highlighted, recommended actions, and a statistical snapshot. For city council meetings, campaign prep, and advocacy reports.
General — An op-ed or explainer piece. Curious tone, narrative arc, accessible statistics, honest caveats. For personal research, newsletters, and public writing.

The report downloads as a formatted Word document (.docx) ready for editing in Microsoft Word or Google Docs. Markdown and clipboard copy are also available.

What was once a dissertation-level undertaking now takes ten minutes and a good question.

Your Data & Storage

Everything is stored in your browser only. There is no server database, no account system, and no cloud sync. Specifically:

Collections (posts, comments, configuration) are saved in your browser's localStorage.
An in-progress draft (a collection you're still gathering, page by page) is held locally so a refresh won't lose it.

AI analysis is powered by Claude Opus 4.6 via a secure managed proxy. No API keys are required or stored on your device. The Reddit data itself is fetched by you, in your own browser, and pasted in directly — it never passes through any Threshing Floor server.

This means your data does not sync across browsers or devices. If you switch browsers, your collections will not follow.

Clearing Your History

To erase all Thresh data from your browser:

Open your browser's Settings (or press Ctrl+Shift+Delete / Cmd+Shift+Delete)
Navigate to Privacy & Security → Clear browsing data
Select "Cookies and site data" (this includes localStorage)
To clear only Thresh: go to your browser's Developer Tools (F12), open the Application tab, expand Local Storage, find the Thresh site, and delete individual keys or click Clear All

This will remove all saved collections and any in-progress draft. It cannot be undone.

The Manual Harvest — A Desktop Process

Because Version 2 gathers data through your own browser, collecting is a hands-on, two-tab process: Thresh builds a link, you open it, copy the data Reddit shows you, and paste it back. A few things follow from that:

It is a desktop tool. The copy step means selecting a wall of text on the Reddit tab and pasting it here. That is awkward on a phone, so collecting new data is meant for a laptop or desktop. You can still browse, analyze, and export saved collections on any device.
You set the pace. There's no rate-limit gauge anymore, because Thresh isn't making automated requests — you are simply loading Reddit pages the way you always do. Be a courteous visitor: don't hammer reload, and leave a breath between pages.
Big collections come in pages. Reddit shows up to 100 posts per page. Ask for 250 or 500 and Thresh walks you through the pages one paste at a time, reading Reddit's own “next page” marker to hand you each following link.
Drafts are safe. A collection you're still gathering is held in your browser, so a refresh mid-harvest won't lose your progress. Save it to the Floor when you're done, or discard it and start over.

Ethical Considerations

Re-identification risk: Even with usernames removed, unique writing styles or specific details in posts may allow re-identification. Consider this when publishing findings.
IRB guidance: If you are conducting academic research, consult your Institutional Review Board about whether your data collection constitutes human subjects research.
Reddit's Terms: This tool accesses publicly available data. Please review Reddit's Terms of Service and API Terms regarding data collection and use.
Consent: Reddit users post publicly, but they may not expect their posts to be aggregated and analyzed. Handle data with care and respect.
Default anonymization: Exports anonymize usernames by default. You can disable this, but consider the implications before doing so.

Provenance

Every export includes a provenance.txt file documenting exactly how the data was collected: the subreddit, sort method, time filter, number of posts, date of collection, and any filters applied. This is the seal on every bundle — it gives you the language for a methods section, a transparency report, or a replication attempt.

AI Analysis

Thresh uses Claude Opus 4.6 (by Anthropic), the most capable AI model available, for analysis and research report generation. AI features are built in and free to use — no API key needed.

AI requests are routed through a secure managed proxy at api.the-threshing-floor.com. The API key is stored server-side as an encrypted secret. Your data is sent to Anthropic for analysis and is not logged or stored by the proxy.

Open Source

The Threshing Floor is free and open source. The code is available on GitHub.

Get in Touch

Found a bug? Have a feature request? Want to share how you’re using Thresh? Reach out anytime:

[email protected]

Citation

Thomas, J.E. (2026). The Threshing Floor: A browser-based tool for Reddit data collection and export. https://github.com/jethomasphd/The_Threshing_Floor

Built using Latent Dialogic Space

The Floor

What is Thresh?

Why the Method Changed

Quick Start

How It Works

Recent Collections

Who Uses This — And How

Public Health Researcher

Journalist

Graduate Student

Community Organizer

Labor Market Economist

Political Campaign Manager

Step 1 — Thresh

Set the query

Gather from Reddit

Lay it on the Floor

Building Bigger Datasets

Step 2 — Harvest

No Data Yet

Step 4 — Glean

Nothing to Export

Export Preview

AI Research Report

Research Report

Step 3 — Winnow

No Data to Analyze Yet

Built-in Analysis

Claude AI Analysis

Post Volume Over Time

Word Frequency

Claude's Analysis

About

A Letter from the Builder

Why the Method Changed

What Is This?

How It Works

The AI Report Generator

Your Data & Storage

Clearing Your History

The Manual Harvest — A Desktop Process

Ethical Considerations

Provenance

AI Analysis

Open Source

Get in Touch

Citation