How Do I Use a Node-Fetch Proxy?
Justas Vitaitis
Last updated -
In This Article
A node-fetch proxy can take your web scraping apps to the next level.
Node Fetch is a popular library that brings the Fetch API to Node.js. With it, you can connect to pages, send post data, and request contents. Making it a suitable tool for many tasks, including Node.js web scraping . In addition, Node.js includes default support for the fetch function since v18.
But there’s a problem. Out of the box, there’s no node-fetch proxy option. Therefore, you can get blocked really quickly when using only your own IP address.
Today, we will explore how you can fix this using a code library. In addition, you’ll be able to use node fetch proxies with authentication and custom user agents to avoid getting blocked.
Let’s get started!
What Is Fetch in Node.js?
Fetch in Node.js is a library that works just like the Fetch API for browsers. The difference is that the Fetch API is only available on the client side. While Node.js Fetch is available on the backend.
Therefore, the Fetch API is a suitable tool for front-end developers. They can use Fetch to load resources using many options. On the other hand, Node Fetch is available to backend developers using a Node.js server. They can load resources programmatically and interact with them.
Is Node-Fetch the Same as Fetch?
Node-fetch and fetch are not the same thing but they have very similar behaviors and syntax.
window.fetch(), also known as fetch, is a client-side function. Therefore, you can run it from your browser. Node-fetch is a backend library available in Node.js . You can run it programmatically from your Node.js server.
How Do I Use a Node-Fetch Proxy?
There are two main ways to use a node-fetch proxy. You can use a code library that creates a custom user agent, or you can use a reverse proxy .
Web Scraping Without Getting Blocked With a Node-Fetch Proxy
Although web scraping is legal and very useful, many sites try to block it. They use a few tools to do it, but two of the biggest points are the IP address and the request headers.
They check if a single IP address is performing a large number of requests or if they are done around the same time. This is a telling sign that it’s a bot and not a real user.
Regarding the request headers, they check if the request metadata looks like a request made from a real browser. If you don’t use any parameters, a bot will just request the URL without sending any information. But a real user sends a lot of data about themselves, such as browser, version, language and more. Therefore, requests without this metadata are quite suspicious.
Let’s see how you can avoid getting blocked by fixing these two issues.
You can fix the IP detection by using IPRoyal’s residential proxy service . With it, you connect using different IP addresses from real residential users around the world. Website owners won’t be able to tell that two different requests with different IP addresses are from the same user, so you’ll be left alone.
After you sign up, you get access to the Client Area. In it, you can see the proxy details you need to use in your connections:
In addition to authenticated requests, you can whitelist an IP address if you want. This feature allows you to use your proxy without sending a username and password:
Now you have your proxy details. Let’s use this data in your code to implement a web scraper with a Node-fetch proxy.
How to Use Https-Proxy-Agent
One of the options to implement a node fetch proxy is to use a code library for a custom user agent. A popular library to implement this is https-proxy-agent by Nathan Rajlich .
In terms of implementation, it’s quite simple. You just need to install node fetch (you don’t need it if you are running NodeJS v18 or higher) and https-proxy-agent:
npm install node-fetch
npm install https-proxy-agent
Then, in your code, use the HttpsProxyAgent as the agent parameter of your fetch request. Like this:
const fetch = require('node-fetch');
const HttpsProxyAgent = require('https-proxy-agent');
(async () => {
const proxyAgent = new HttpsProxyAgent('http://geo.iproyal.com:12321');
const scrape = await fetch('https://ipv4.icanhazip.com/', { agent: proxyAgent } );
const html = await scrape.text();
console.log(html);
})();
Notice that this code doesn’t use any proxy authentication. So if you want to use it this way, you need to whitelist your own IP address.
You can pass the authentication details to HttpsProxyAgent using a few methods. The simplest one is using the username and password as plain text in your request, like this:
(async () => {
const proxyData = new HttpsProxyAgent('https://username:[email protected]:12321');
const scrape = await fetch('https://ipv4.icanhazip.com/', { agent: proxyData } );
const html = await scrape.text();
console.log(html);
})();
Other options for node-fetch proxy authentication with https-proxy-agent also exist, such as using auth or custom headers.
How Do I Use Node-Fetch Proxy User Agents?
You can use user agents using the options argument in your node fetch proxy call. So instead of the simple fetch request:
const scrape = await fetch('https://ipv4.icanhazip.com/');
You can do it using the options argument after the URL:
const scrape = await fetch('https://ipv4.icanhazip.com/', { headers: { /** request headers here **/ }} );
Therefore, in addition to user agents, you can use other header arguments. Here is a code sample using node-fetch proxy and custom user agents at the same time:
(async () => {
const proxyData = new HttpsProxyAgent('https://username:[email protected]:12321');
const options = {
agent: proxyData,
headers: {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 13_0_1) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.1 Safari/605.1.15'
}
};
const scrape = await fetch('https://ipv4.icanhazip.com/', options );
const html = await scrape.text();
console.log(html);
})();
In this case, we are using just the user-agent, but you can pass any request headers you want.
Conclusion
Today, we learned how to use node fetch proxies to scrape websites without getting blocked. In addition, you saw how you could use custom request headers to make your scraping functions even better.
Now you can connect your scraper with a parser and extract data from any site you want.
We hope you’ve enjoyed it, and see you again next time!
FAQ
ReferenceError: fetch is not defined or Fetch is not defined javascript
This happens when Node doesn’t find the fetch function. If you are running Node.js under v18, you will need to install node fetch or other similar module.
If that’s the case, make sure that you have installed node fetch and that you have included in your script using:
const fetch = require('node-fetch');
Node fetch timeout
If you are facing timeout issues or if you want to add a timeout option, you can use a timoutpromise , or use a library such as hpagent to manually control timeouts.
Error: Cannot find module 'node-fetch' after runing 'npm install node-fetch'
Before installing node-fetch make sure that npm itself is up to date. Try something like this:
npm install -g npmnpm cache cleannpm update
Another option is to install it locally, instead of globally
npm i node-fetch
And you could try this:
import fetch from 'node-fetch';
Instead of using require. Or you can try to load it like this:
const fetch = (...args) => import('node-fetch').then(({default: fetch}) => fetch(...args));
Author
Justas Vitaitis
Senior Software Engineer
Justas is a Senior Software Engineer with over a decade of proven expertise. He currently holds a crucial role in IPRoyal’s development team, regularly demonstrating his profound expertise in the Go programming language, contributing significantly to the company’s technological evolution. Justas is pivotal in maintaining our proxy network, serving as the authority on all aspects of proxies. Beyond coding, Justas is a passionate travel enthusiast and automotive aficionado, seamlessly blending his tech finesse with a passion for exploration.
Learn More About Justas Vitaitis