Skip to content

Latest commit

 

History

History
461 lines (343 loc) · 11.8 KB

File metadata and controls

461 lines (343 loc) · 11.8 KB

Profile Setup Guide

Your profile.json tells the matcher who you are and what you're looking for. The more detail you give, the better your matches will be.


Quick start

Copy the sample and edit it:

cp profile.sample.json profile.json

Then validate:

python job_matcher.py --validate

Full field reference

name

Type: string Required: no

Your name. Only used for display in the terminal header.

"name": "Anantha"

summary ⭐ most important field

Type: string Required: yes

Free text description of your background. This is what the LLM reads to understand who you are and match you to jobs. Write at least 30 words. Be specific.

What to include:

  • Years of experience
  • Your tech stack
  • What kind of products you have built
  • What roles you are targeting
  • What type of company you want
  • Location preferences
  • Visa status

Bad summary — too vague:

"summary": "Software engineer looking for jobs at startups"

Good summary — specific and rich:

"summary": "3 years experience as a full stack engineer. Strong in Python, React, FastAPI, PostgreSQL. Built B2B SaaS products from scratch at seed stage startups. Looking for founding engineer or early engineer roles at seed stage AI or B2B startups. Open to San Francisco or remote. No visa sponsorship needed."

The more specific your summary, the more accurate the LLM matching will be.


years_experience

Type: integer Required: no

Your years of professional experience. Used for display only — the LLM decides seniority match semantically from your summary.

"years_experience": 3

skills

Type: array of strings Required: recommended

Your technical skills. Used by the LLM to score skill overlap with job requirements.

"skills": ["Python", "React", "FastAPI", "PostgreSQL", "Docker", "REST APIs"]

Be specific — include languages, frameworks, databases, and tools you actually know.


roles_looking_for

Type: array of strings Required: recommended

Job titles you are targeting. Passed to the LLM to improve matching accuracy.

"roles_looking_for": ["Founding Engineer", "Early Engineer", "Full Stack Engineer"]

Common YC job titles:

Founding Engineer
Founding Full Stack Engineer
Founding AI Engineer
Early Engineer
Software Engineer
Full Stack Engineer
Backend Engineer
Frontend Engineer
ML Engineer

locations

Type: array of strings Required: recommended

Locations you are open to working in. Use the exact region names from YC's filter system below.

"locations": ["America / Canada", "Remote"]

Available location options — use exactly as written:

Value What it covers
America / Canada US and Canada based jobs
Remote Remote jobs
Partly Remote Hybrid remote jobs
Fully Remote 100% remote jobs
Europe UK, France, Germany, Spain, Netherlands, Sweden, Switzerland and all European countries
South Asia India, Pakistan, Bangladesh, Nepal
Southeast Asia Singapore, Indonesia, Philippines, Malaysia, Vietnam, Thailand
Latin America Mexico, Brazil, Colombia, Argentina, Chile and more
Africa Nigeria, Kenya, Ghana, South Africa and more
Middle East and North Africa Israel, Egypt, UAE, Saudi Arabia, Turkey and more
East Asia Hong Kong, China, South Korea, Japan
Oceania Australia, New Zealand

Examples:

"locations": ["America / Canada", "Remote"]
"locations": ["Fully Remote"]
"locations": ["Europe", "Remote"]
"locations": ["South Asia", "Remote"]
"locations": ["America / Canada", "Europe", "Fully Remote"]

industries

Type: array of strings Required: recommended

Industries you are interested in. Use the exact values from YC's industry filter below.

"industries": ["B2B", "Fintech", "Healthcare"]

Available industry options — use exactly as written:

Value Description
B2B Business to business software and services
Consumer Consumer products and apps
Fintech Financial technology
Healthcare Health and medical technology
Education EdTech and learning
Industrials Manufacturing, logistics, supply chain
Real Estate and Construction PropTech and construction tech
Government GovTech and civic tech

not_interested_in

Type: array of strings Required: no

Topics you want to hard filter out before the LLM sees them. Jobs mentioning these keywords anywhere in their title, description, or skills get removed automatically — saving API calls and improving result quality.

"not_interested_in": ["Web3", "Gaming", "Defense", "Crypto", "Enterprise", "Sales"]

These are case-insensitive substring matches:

"Web3"    → removes any job where "web3" appears anywhere
"Gaming"  → removes any job where "gaming" appears anywhere
"Defense" → removes any job where "defense" appears anywhere

Use this for topics you absolutely do not want. For softer preferences use deal_breakers instead which lets the LLM decide.


deal_breakers

Type: array of strings Required: no

Hard conditions that are absolute nos. These get added to the LLM prompt as:

HARD NO — skip if any apply: no equity, no remote, below $100K

The LLM will skip jobs where these conditions apply.

"deal_breakers": ["no equity", "no remote", "below $100K"]

Examples:

"no equity"
"no remote"
"below $80K salary"
"India only"
"requires 6+ years"
"no sponsorship"

needs_visa

Type: boolean Required: yes

Whether you need visa sponsorship to work in the US.

"needs_visa": false
  • false — you are a US citizen or resident, do not need sponsorship
  • true — you need the company to sponsor your visa

When true, jobs that say "US citizen only" and do not mention sponsorship get filtered out automatically before the LLM.


llm_provider

Type: string Required: yes

Which LLM provider to use for matching.

"llm_provider": "groq"

Available options:

Provider Cost Speed Quality Notes
ollama Free Medium Good Runs locally, needs Ollama installed
groq Free tier Fast Great Recommended for most users
openai Paid Fast Great GPT models
claude Paid Fast Great Anthropic models
gemini Free tier Fast Great Google models

Recommendation: Start with groq — free tier, fast, and high quality. Get a key at console.groq.com.


model

Type: string Required: no (uses provider default if empty)

Specific model name to use. Leave empty "" to use the provider default.

"model": "llama-3.3-70b-versatile"

Default models per provider:

Provider Default model
ollama llama3.1:8b
groq llama-3.3-70b-versatile
openai gpt-4o-mini
claude claude-haiku-4-5-20251001
gemini gemini-2.5-flash

Other good options:

Ollama (local, check your RAM):

llama3.2:3b     → needs 2GB RAM, fast
llama3.1:8b     → needs 5GB RAM, better quality

Groq (free, runs on their servers — no RAM needed):

llama-3.3-70b-versatile   → best quality, recommended
llama3.1-8b-instant       → faster, lighter

OpenAI:

gpt-4o-mini   → cheap and fast
gpt-4o        → best quality, more expensive

Gemini:

gemini-2.5-flash       → fast, good quality
gemini-2.5-flash-lite  → faster, lighter

api_key

Type: string Required: yes (except for ollama)

Your API key for the chosen provider. Not needed if using ollama.

"api_key": "gsk_your_groq_key_here"

Where to get your key:


top_n

Type: integer Required: no (default: 10)

How many final matches to show in terminal and save to files.

"top_n": 10
Value Best for
10 Focused, high quality shortlist
20 Broader view
30 Maximum coverage

Note: top_n must be less than scan_limit. If you scan 50 jobs and ask for 30 results, you will get fewer than 30 because not all 50 jobs will score above the match threshold.


scan_limit

Type: integer or null Required: no (default: null = scan all)

How many jobs to send to the LLM after hard filtering. Controls the tradeoff between speed, cost, and coverage.

Jobs are shuffled randomly before the limit is applied — so each run sees a different subset, giving better coverage over multiple runs.

"scan_limit": 100
Value Jobs scanned Speed API calls Best for
30 30 Very fast ~6 Quick test run
100 100 Fast ~20 Daily use
200 200 Medium ~40 Thorough search
null All 500+ Slow ~100+ Maximum coverage

Good combinations:

"scan_limit": 100, "top_n": 10    ← recommended daily use
"scan_limit": 200, "top_n": 20    ← thorough search
"scan_limit": 50,  "top_n": 10    ← quick run
"scan_limit": null, "top_n": 30   ← scan everything

Complete example

{
  "name": "Anantha",
  "summary": "3 years experience as a full stack engineer. Strong in Python, React, FastAPI, PostgreSQL. Built B2B SaaS products from scratch at seed stage startups. Looking for founding engineer or early engineer roles at seed stage AI or B2B startups. Open to San Francisco or remote. No visa sponsorship needed.",
  "years_experience": 3,
  "skills": ["Python", "React", "FastAPI", "PostgreSQL", "REST APIs", "Docker"],
  "roles_looking_for": ["Founding Engineer", "Early Engineer", "Full Stack Engineer"],
  "locations": ["America / Canada", "Remote"],
  "industries": ["B2B", "Healthcare", "Fintech"],
  "not_interested_in": ["Web3", "Gaming", "Defense", "Crypto"],
  "deal_breakers": ["no equity", "no remote"],
  "needs_visa": false,
  "llm_provider": "groq",
  "model": "llama-3.3-70b-versatile",
  "api_key": "gsk_your_key_here",
  "top_n": 10,
  "scan_limit": 100
}

Minimal example (only required fields)

{
  "name": "Your Name",
  "summary": "3 years full stack engineer. Python, React, FastAPI. Looking for founding engineer roles at AI startups. Open to San Francisco or remote. No visa needed.",
  "needs_visa": false,
  "llm_provider": "ollama",
  "model": "llama3.1:8b",
  "api_key": ""
}

Tips for better results

Write a detailed summary The summary is the single most important field. The LLM reads it and decides what is a match. More detail means better matches.

Use not_interested_in aggressively Add entire topics or industries you never want. It saves API calls and removes noise before the LLM ever sees those jobs.

Increase scan_limit for better coverage With scan_limit: 50 you only see 50 random jobs per run. Run multiple times across a few days to cover the full dataset.

Start with Groq Free tier, fast, uses a 70B model. Best quality at zero cost. Get a free key at console.groq.com.

Validate before first run

python job_matcher.py --validate

Validate your profile

python job_matcher.py --validate

Output when good:

✓ Profile looks good — ready to match

Output with warnings:

⚠ Profile warnings:
  • Summary is too short — add more detail for better matches
  • No industries set — add preferred industries