Scraping data is all the rage nowadays. But what many don’t know is that you don’t need to be a fancy hacker to be able to collect data from websites. In fact, you don’t even need any coding skills.
A multitude of tools such as browser extensions exist to alleviate the required technical knowledge. But even if that’s one hurdle too far for you, do not worry. Google will come to the rescue. Google Sheets to be precise.
It has a nifty little formula which allows you to grab a web page’s list or table of data into your sheet of choice. It’s called importhtml, and it works as follows:
Find a website you want to pull data from
As an example, let’s go for the List of largest technology companies by revenue Wikipedia page.
Identify a table or list you want to grab
In this case, let’s go for the second table on the page, in the ‘2019 list’ section.
Now, type the following into a cell of choice, let’s say A1
=IMPORTHTML(“https://en.wikipedia.org/wiki/List_of_largest_technology_companies_by_revenue”, “table”, 2)
As you can see, you’ll have to declare three things in the formula: The url you want to grab data from, the type of data (either table or list), and the position (in this case the second table, so 2).
Hit enter, and voilà, the table appears in your sheet:
To go next level, and actually transform or clean that data, make sure that it becomes static instead of linked data first. To do so, select the table, right mouse click on cell A1, ‘Paste special’ > ‘Paste values only’.
So there you go, have fun playing with data in Google Sheets!