How to Scrape Data from Linkedin Using Proxies

With over 500 million users, LinkedIn is the digital Rolodex of the modern age. If you don’t have an account you should probably get one. You can rub shoulders with major players in your industry, creep on old high school acquaintances, and strategize your next business move.

That’s all for the normal user of LinkedIn, which I am, and which you should be.

However, for the scraper, LinkedIn has an entirely different meaning. Instead of connecting manually with people in an industry, scrapers see LinkedIn as a gold-filled mine of personal information. A mine with 500+ million (and growing) nuggets, all of which can be harvested in a variety of ways.

Then there are company profiles on LinkedIn, which are separate from individual users and adds an entire another element for a scraper.

Why Scrape LinkedIn?

LinkedIn is a literal representation of people and companies in the workforce, and they keep their info up to date. This data is incredibly valuable.

Of course, you can’t scrape all the data I listed above. But you can scrape some of it.

Does LinkedIn Allow Scraping?

While that language is solid, this subject is best illustrated by the lawsuit LinkedIn took out against 100 anonymous data scrapers who did what you’re trying to do but did it poorly. The verdict of the case has not been decided at the time of writing, and it brings up many issues around scraping that is beyond the purview of this article.

The point I’m trying to make is that if you do plan to scrape LinkedIn, be very cautious. They really don’t want you to do it, so if you plan to you have to do it right.

How to Scrape LinkedIn

  • The applications required to do the scraping
  • The parameters you need to set in the applications
  • The type of pages you will scrape on LinkedIn (public or private)
  • The types of proxies to use, and how many proxies to use

An easy sample to scrape LinkedIn with python

LinkedIn Crawling Applications

Choosing an application is important, as many of them cost money. You’ll want to have a full understanding of the software itself, and then what you’re trying to get out of LinkedIn in order to make a solid return on your investment.

Parameters within the Application Need to note!

1. Threads

The very cautious use one thread per proxy. That’s what a true human does, so anything more than that will, at some point, become suspicious. However, plenty of scrapers use up to 10 threads per proxy.

Due to LinkedIn’s extreme policy against scraping, I recommend staying to the single thread per proxy. Yes, it will slow results and cost more in the long run. In my view, those are costs built into scraping LinkedIn and avoiding a lawsuit.

2. Timeouts

If your timeouts are set to 10 seconds, your proxy will send another request for information from the server after 10 seconds of it not responding.

Many scrapers set the timeout very low: 1 or 2 seconds. This produces a huge number of results because it creates new requests for information often, meaning you get results more often.

Don’t do this. Set your timeouts high, between 30–60 seconds. This gives the server a solid pause before that particular proxy sends another request.

Think of it like a human: does a human reload a website’s home page every second if there is lag? Maybe, but they don’t do it a thousand times in a thousand seconds on repeat.

By setting your timeouts high you avoid a lot of the detection by LinkedIn and don’t overwhelm them with repeated requests.

Scraping Public Profiles on LinkedIn Through Search Engines

However, it has plenty of public pages. These can be viewed without an account, and can, therefore, be scraped without logging in, for you can easily view the public LinkedIn profile without an account.

You are free to scrape public pages on LinkedIn like any normal scrape that starts with a search engine, You have to enter the correct search terms, like including “LinkedIn.com”, which will generate results in Google that point to specific LinkedIn pages.

Your scraper can then access the information available on these public pages and return it to you. You’ll be scraping both Google and LinkedIn in this context, so you’ll want to be careful not to set off the alarm bells for either of them.

You can get very specific with this, searching for an industry sector of company pages on LinkedIn through an engine, like Microsoft or Google, or Apple. You would do this by scraping for “Apple LinkedIn” and then scraping the results.

This will only give you public pages though, and you may not want to be limited.

Use Rotating Backconnect Proxy to anonymous Scraping

So, if you just want to Scrape public profiles the best solution is to use backconnect rotating proxies for Scraping data on google and LinkedIn!

  • Luminati — 72+ million residential IPs in proxy pool
  • Smartproxy — 40+ million residential IPs in proxy pool
  • Shifter — 31+ million residential IPs in proxy pool

Scraping Private Profiles on LinkedIn

Private pages are another matter. When a person signs up with LinkedIn they are told their information will be kept private, not sold to other companies, and used for internal use only. When a scraper comes along to grab that information LinkedIn has a major problem on its hands.

I don’t condone this activity if you’re using your scrape to sell an individual’s information. This basically means you’d be bypassing LinkedIn’s privacy clause, harvesting personal information from people, then selling it to companies for a profit. Not the coolest thing to do.

There are other reasons to scrape this information though. Maybe you’re on a job hunt and want to find programmers in a specific city or available jobs in a new state. You can scrape for research, too. Either of these seems fine to me, but the for-profit model doesn’t.

Create Accounts

To do this I recommend Octoparse. Their software allows you to log in to LinkedIn with an account and apply specific searches and scrapes with a drag and drop interface, all while showing you the LinkedIn page you’re on. It’s very nice visually if a little clunky to use.

You could figure out a way to do it with other applications but it won’t be as easy.

Search and Harvest

Much of the information is still private unless you connect with people, and if you do that you’re basically just running a normal LinkedIn account.

Use Dedicated Proxy Per Account

Also, make sure you’re using one proxy IP address to create the account, and then scrape on that account. This is all about appearing like a human. Most humans don’t access LinkedIn from a different IP address every few hours. They access it from one IP address: their home address.

If you create the account with a proxy IP, use the same proxy IP to scrape on the account, and set all your parameters correctly you will greatly reduce the chances of getting blocked or banned.

Types and Number of Proxies

The final element in all this is the types of proxies you use, and how many of them you use. This coincides pretty heavily with your budget because more proxies (and better ones) equals more cash. Keep that in mind for this whole process.

If you want to scrape private profiles of Linkedin accounts, you have to use dedicated proxies for each account! For you have to log in to views others private profiles, and Linkedin is so strict to IP,When you change the IP to login to account, you have to verify via email!

You want elite private proxies for scraping LinkedIn. With a lawsuit underway, LinkedIn is not kidding around about punishing scrapers. This means you’ll want elite private proxies and only elite dedicated proxies.

These proxies offer the most anonymous and secure HEADER settings out of all the proxy types, and give you unfettered access and speeds. Shared proxies or free proxies (even lesser private proxies) are simply not secure or fast enough to do the job.

You’ll also want to test your proxies to make sure they work with LinkedIn. Due to LinkedIn’s anti-scrape stance, it has a large list of blacklisted IPs. If your proxies are on this list they won’t work at all. Contact your provider to get these details, or test it out for yourself and then chat with them.

Number of Proxies

If you stick to a single proxy per account and want to harvest a lot of data quickly, consider 50 accounts and 50 proxies as a place to get started.

If you want to do more proxies per account (which I don’t recommend), grab somewhere in the 100–200 range and rotate them often so they don’t get noticed, then blocked, banned, and blacklisted.

The fewer proxies you have the more often they’ll be detected. This is always an experiment, so make sure you test everything.

Wrapping Up

B2B marketing specialist, and want to help those who are looking for information about e-business data analysis and integration, marketing automation, etc.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store