Python for Power Systems

A blog for power systems engineers to learn Python.

Python Based Link Checker

I recently needed a link checker to create a csv formatted list of all links (especially hosted pdfs) on a client site.

There is a tool called webcheck by arthur de jong which does a great job of checking all of the links on a website and creating a pretty html report.

This got me most of the way there, I could see that in the output there was a page dedicated to a list of every url that was encountered during the search, which looked like what I wanted but was formatted as html

I wrote a small file which will use webcheck’s own code to read in its stored .dat file and write all of the links to a csv file with the format:

path, extension, internal, errors

Where path is the url, extension is the url ending (for example .pdf, .html, ..), internal is a boolean True or False if the link is an internal link and errors is the error (for example 404, ..) if any for that link.