Categories
HTML JavaScript Marketing

Collecting all the email addresses or links on a webpage with JavaScript

Have you ever wanted, or needed, to collect all of the email addresses or links on a website?

Have you ever tried to do this manually?

I have. It’s incredibly tedious.

There’s nothing quite as dehumanizing as robotically copying and pasting hundreds of email addresses or links from a webpage to an excel sheet over and over again.

So let’s make sure you never have to do that again.

In this guide, I’ll show you how to use JavaScript to collect all of the email addresses or links on a webpage.

It’s actually pretty easy. So let’s get started.

Thinking before writing your code

Before we begin writing our program, it’s always useful to think about what we’re trying to accomplish; this helps us write a more coherent program.

Here’s what we know we’ll need to do for this program:

  1. Search through the content of the webpage for emails/links.
  2. Collect the email/links.

Seems pretty straightforward, right? It is!

Now let’s break down each of the steps above into actual statements that we’ll need to write to complete the task of collecting emails from a website.

Here’s what we need to do:

  1. Create an empty array to populate with email/links.
  2. Specify where in the DOM we want JavaScript to search.
  3. Convert the content of the DOM to a string.
  4. Use the .match method to specify what we’re searching for.
  5. Add the matched items to our array.

Also pretty straightforward, so let’s start writing some code.

I created a test page for you to practice with. Please open this page in a new window beside this one, activate your developer console, and follow the instructions below.

Collecting all the email addresses on a webpage

Once you’ve opened up this page and started your JavaScript developer console, you’ll need to create an empty array to store the email addresses.

Please type this into your JavaScript developer console:

var listOfEmails = [];

Great, now let’s determine where we want JavaScript to look for these email addresses.

In the test page, I created for you, the email addresses are within the opening <body> and closing </body> HTML tags of the page.

Let’s write a variable and assign the content within these tags to it:

var contentToSearch = document.body.innerHTML;

Now let’s verify the content assigned to our variable is correct by typing the following in our JavaScript developer console:

contentToSearch;

Did you see the HTML content of the page appear in your developer console? Excellent.

Now we need to convert the content from HTML to text so we can search it. To do this we’ll use the .toString method and apply it to our contentToSearch variable, which will convert all the HTML to text:

var contentAsText = contentToSearch.toString();

All of the content within the opening <body> and closing </body> HTML tags have just been converted to text and assigned to the variable contentAsText.

Let’s now search through it for the email addresses.

To do this we’ll use the .match method on the variable contentAsText in conjunction with a regular expression that matches some standard email address patterns:

listOfEmails = contentAsText.match(/([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9._-]+)/gi);

Now, to access the list of emails you just need one final step, type the following in your browser console:

listOfEmails

You should see your list of emails! Well done!

Here’s the entire program we just wrote.

Final program for collecting all the email addresses on a webpage

var listOfEmails = [];
var contentToSearch = document.body.innerHTML;
var contentAsText = contentToSearch.toString();
listOfEmails = contentAsText.match(/([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9._-]+)/gi);
console.log(listOfEmails);

Please don’t forget to run this code in your web browser’s developer console.

Collecting all the links on a webpage

This will be pretty similar to the program we wrote for collecting all the email addresses.

Please open this page in a new window beside this one, activate your developer console, and start by creating an array to store our links:

var listOfLinks = [];

Now let’s collect all the links on the page.

Lucky for us, there’s a default JavaScript method called .links which can collect all the links on a page for us, without the need for writing a custom function.

Let’s write a variable to use the .links method:

var collectLinks = document.links;

Now we’ll need to loop through the links one by one and add them to our array.

Let’s do this with a JavaScript loop which utilizes the .push method:

for(var i=0; i<collectLinks.length; i++) {
  listOfLinks.push(collectLinks[i].href);
}

Did you see the number 12 in your developer console?

If you did, congratulations, you’ve just collected all of the links on the page.

Now let’s take a look at the content of our array by typing the following:

listOfLinks;

Notice it collected all the email addresses and links?

That’s perfectly normal since both email addresses and links use the HTML tag a href.

Well done!

Here’s the entire program we just wrote.

Final program for collecting all the links on a webpage

var listOfLinks = [];
var collectLinks = document.links;
for(var i=0; i<collectLinks.length; i++) {
  listOfLinks.push(collectLinks[i].href);
}
console.log(listOfLinks);

Please don’t forget to run this code in your web browser’s developer console.

Why this works

You might be wondering how this is possible, considering you don’t have the ability to edit the JavaScript files on a website you are visiting.

Well, you don’t actually need to edit a website’s JavaScript files directly as you can run the JavaScript program in your web browser’s developer console. This is actually one of JavaScript’s best features!

So, with only our web browser and a little bit of JavaScript know how we were able to collect all the email addresses and links on a webpage.

Pretty handy right?

Next, let’s learn how to add some intelligence to our programs with conditional statements which let our programs make decisions on their own.