Product

How sourcing platforms keep working as LinkedIn tightens API access

The LinkedIn API is OAuth-only. Public profiles are gone. So how do sourcing platforms still ship? The layered architecture behind every modern candidate index.

Siva

25 May 2026

Ask AI about this

I get one engineering question more than any other from agency owners and product people in our space. How do you keep a candidate index fresh against LinkedIn in 2026 when the API is gated, the public profile is gone, and every "free" sourcing tool seems to be one cease-and-desist away from disappearing? It is a fair question. The honest answer is that nobody is doing it through a single magic API. Sourcing platforms that work in 2026 are layered. There is no other shape that survives.

This post walks through the layers, what each one actually does, where the boundaries are, and what every agency should be checking when a sourcing vendor pitches them in 2026.

The LinkedIn API is not what you think it is

First, the boring fact base. LinkedIn's official API is documented on Microsoft Learn, which Microsoft maintains as the canonical source for everything LinkedIn-developer. The Profile API is gated behind OAuth 2.0. There are two flows: 3-legged OAuth where a specific LinkedIn member grants your application access to their account, and 2-legged OAuth where your application authenticates itself for non-member-specific endpoints.

The Microsoft docs are blunt about what the 3-legged flow means. "Your application has no access to these resources without member approval." That single sentence is the entire story of why public profile scraping does not work as a sustainable strategy. The Profile API only returns data for members who have actively authenticated with your application and granted permission. You cannot pull a stranger's profile via the official API. You never could, and the post-2022 enforcement made sure you cannot informally either.

For recruitment-specific use cases, the gate is even higher. Recruiter System Connect (RSC), which is what serious ATS integrations use, requires partner-level access. Microsoft's docs state directly: "The use of these APIs is restricted to those developers approved by LinkedIn. Please reach out to your LinkedIn Relationship Manager or Business Development contact as you will need to meet certain criteria and sign an API agreement with data restrictions in order to use this integration." Translation: you sign a contract, you get audited, you carry an obligation to handle the data within tight rules, and you do not get to repackage it for general sourcing.

If you have ever wondered why almost no recruitment platform claims to have "full LinkedIn API access", this is why. The ones who do have RSC have it for narrow ATS-sync use cases. The ones who do not have RSC are building from other sources.

What sourcing platforms actually do, in layers

Given the API position above, every sourcing platform that ships in 2026 is built on a layer cake of data sources. No single source is rich enough on its own. The platforms that work are the ones that resolve and reconcile across many. Inside Recruitly the layers look roughly like this, from candidate-controlled at the bottom to public-web at the top.

Layer 1: candidate-volunteered data. CVs the candidate sent you, applications they submitted, profiles they imported themselves. This is the cleanest, most legally defensible, and most up-to-date layer. It is also the smallest. No sourcing tool can live on Layer 1 alone, but every sourcing tool's primary record should start here.

Layer 2: team-side enrichment via a browser extension. When a recruiter opens a LinkedIn profile in their own logged-in browser session, our extension can read what is on the screen for that recruiter and write structured fields back into the CRM. This is the recruiter's own legitimate session viewing a profile they would have viewed anyway. The data flows recruiter to CRM, not LinkedIn to CRM via API. We covered this in the post on your team's LinkedIn network as untapped asset. It is the single largest source of fresh candidate data in any modern agency stack.

Layer 3: aggregated public signals from non-LinkedIn sources. GitHub, Stack Overflow, conference speaker lists, podcast appearances, published papers, government registries of professionals (doctors, lawyers, accountants in many jurisdictions), company About pages, press releases announcing senior hires. None of these is LinkedIn. All of them are public. None of them are forbidden. Stitched together, they can populate a surprisingly rich profile for a senior candidate without touching the LinkedIn perimeter at all.

Layer 4: third-party enrichment vendors who carry their own legal relationships. These vendors source candidate data through their own consent flows and licence it. You consume their API, they handle the upstream data contracts. This is where the legal risk lives if you pick the wrong vendor. Pick one that can show you their consent model and where their data comes from. If they cannot, they are the risk you are buying.

Layer 5: candidate consent flows back to your CRM. When a candidate applies, accepts an outreach message, joins a talent community, or completes a form, you have an explicit relationship and consent. That consent is the single most defensible status any record can have under GDPR, UAE PDPL, California ADS regulations, or any of the AI hiring laws we cover in the AI Act series. Build for it on purpose.

The dirty secret about "search 800M profiles"

Almost every sourcing platform's homepage now boasts a number in the hundreds of millions. Sometimes the number is real. Sometimes it is the union of every public source they could find, with massive overlap, stale records, and no last-verified date attached. The number alone tells you almost nothing.

The questions that actually matter when a vendor pitches you a big number are these. What was the data source for each record? When was each record last verified? What is the consent or legitimate-interest basis? What happens when a candidate exercises their right to deletion under the GDPR or PDPL? Can the vendor show you the deletion logs?

A platform that has 800 million records but cannot answer those questions is a platform that ships you a future legal problem at scale. A platform with 80 million records that can answer all of them is a stronger sourcing partner.

The recruitment industry has spent a decade making "number of profiles" the headline metric. The next five years will move the headline metric to "data provenance and freshness". The platforms that win that shift are the ones who treat each record as a small contract, not a row in a CSV.

Identity resolution: the engineering problem nobody talks about

When you stitch five layers together, you end up with the same person represented by five different records, with overlapping but not identical fields. The engineering problem is figuring out which records are the same person and merging them without losing information. This is the same problem I walked through in how we dedupe candidate records, but at a different scale and with different inputs.

The single strongest cross-source identifier remains the normalised LinkedIn vanity URL. It is editable, but the edit rate is low enough that vanity matching across sources has very high precision. If two records from two completely different layers carry the same normalised vanity, that is the same person until proven otherwise. Phone numbers in E.164 are the next strongest, then verified work email, then full-name plus an overlapping employer-history fingerprint.

When signals disagree, the policy matters more than the algorithm. We default to "create suggested merge, do not auto-apply" for anything below a strong-signal threshold. The cost of an over-eager merge that joins two real people is much higher than the cost of a missed merge that leaves two records for the same person. You can always merge later. You cannot easily un-merge once a recruiter has acted on the combined record.

Freshness is harder than discovery

Finding a candidate's profile once is easy. Keeping that profile current as the candidate moves jobs, changes phone numbers, updates skills, and edits their LinkedIn is much harder. Most "800 million profile" indexes have a freshness distribution that looks ugly when you actually plot it. A small head of records updated in the last month, a long tail of records last updated three or four years ago.

The layered approach helps here too. Every time a recruiter on your team opens a profile via the extension, Layer 2 refreshes that record. Every time a candidate applies or interacts, Layer 5 refreshes it. Every nightly run against public signals refreshes a slice of Layer 3. The records that get used most often stay fresh because using them is what refreshes them. The records that nobody touches grow stale and eventually fall out of the active index. This is how a layered platform stays useful at scale without pretending every record is equally trustworthy.

The implication for agencies is that the value of a sourcing index is concentrated in the records your team actually touches. A vendor's total profile count is a vanity number. The number of profiles your team has interacted with in the last 90 days is the operationally meaningful one. Ask vendors for that number. Most cannot give it to you.

Consent is not a tax. It is the moat.

The recruitment industry spent years treating GDPR, PDPL and now the various AI hiring laws as compliance overhead. The agencies who took them seriously and built consent flows into their candidate journeys are now sitting on the most defensible sourcing assets in the market. Consent is not a cost. It is a moat against the platforms that grew their numbers by skimming public web data and are now spending six figures on lawyers.

Practically, this means three things in product design. Make the candidate consent journey short and honest. Tell candidates clearly what data you hold, how you got it, and how they can change or delete it. Build deletion as a first-class operation, not a hidden form. Log every consent and every change to it with a timestamp and a basis. When the regulator asks (and they are starting to, as we covered in the UK AI hiring law guide), you produce the log and the conversation ends.

Agencies that try to retrofit consent on top of a scraped index will find the cost much higher than building it correctly from the start. There is no shortcut.

What to look for when picking a sourcing platform in 2026

Ask which layers they use. A vendor who cannot describe their data sources by layer is selling you a black box. The good ones can tell you what comes from where, how often each layer refreshes, and what happens if one layer goes dark.

Ask about the LinkedIn relationship. If they claim RSC, ask which functional modules they use. The Microsoft Learn docs list five modules; a real RSC partner will know them. If they imply they have broad LinkedIn API access for sourcing, that is either RSC for ATS sync (limited to specific operations) or it is wishful thinking. There is no "general sourcing" LinkedIn API.

Ask about deletion handling. When a candidate exercises a right to erasure, what happens? If the answer is "we soft-delete in our index and notify downstream consumers", they thought about it. If the answer is "we delete from our database", ask what happens to enriched records they handed to your CRM yesterday.

Ask about freshness metrics. Average age of the records they would return for a typical search. Distribution of last-verified dates. Cohort retention of records over time. A real engineering team can answer these. A marketing team cannot.

Ask where the consent lives. If a candidate complains, who owns the record of consent? You, the vendor, or a chain of upstream sources that nobody can produce on demand? The answer should be "us, with full audit trail". Anything else is risk transfer in your direction.

Why I am optimistic about the next five years

The narrative around LinkedIn tightening its API and the various AI hiring laws coming online is usually framed as bad news for the sourcing industry. I think it is the opposite. The next five years will see a separation between platforms that built on shortcuts and platforms that built on durable data relationships. The first group will spend more time in court than building. The second group will quietly take the market.

The recruitment teams who will win are the ones using platforms that combine candidate-volunteered data, team-side enrichment via browser extensions, aggregated public signals, vetted third-party enrichment, and proper consent loops. No single layer is enough. The combination is the moat. The platforms that have spent the last few years quietly building that combination, rather than chasing profile-count headlines, are the ones the market will trust over the next decade.

If you want to see how this looks in practice, the Recruitly sourcing page walks through how the layers fit together in a single product. The underlying philosophy applies whether you use us, build it yourselves, or stitch together a stack of point tools. The shape of the answer is the same.

Google & AWS Infrastructure

Zero-Downtime Migration

Single Sign-On & 2FA

How sourcing platforms keep working as LinkedIn tightens API access

The LinkedIn API is not what you think it is

What sourcing platforms actually do, in layers

The dirty secret about "search 800M profiles"

Identity resolution: the engineering problem nobody talks about

Freshness is harder than discovery

Consent is not a tax. It is the moat.

What to look for when picking a sourcing platform in 2026

Why I am optimistic about the next five years

Keep reading

How to build a candidate database from scratch

LinkedIn Recruiter alternatives for sourcing candidates

Your agency's biggest untapped asset is your team's LinkedIn network

How we figure out two candidate records are the same person

How we built a Data Agent that cleans millions of recruitment records

Ready to run your agency on one system?