Broken Link Report

jdc 3-jan-2013 : For list of links believed to be broken on the wiki, check https://wiki.tcl-lang.org/_/brokenlinks . It is updated daily.

https://www.tcl-lang.org has a lot of broken links. I ran TclLinkCheck against it, and put the results up here:

http://www.dedasys.com/report.html DEAD LINK

I think it wouldn't be a bad idea to do this with the Wiki, as well.


Would it be an idea to integrate this into wikit - or run it periodically on the same site? -jcw


Wouldn't it be neat if we could wrap the links in http://web.archive.org calls and check to see if the pages were available in one of the web archives?


If it were integrated, it could be much more efficient. I think it makes a lot of sense - there are too many dead links in the Tcl world:-(

jcw - Ok, let's try to sketch the logic - the link checker runs once in a while. What should it do?

  • create a page with broken links and page refs (might be quite large)?
  • have a mechanism to only declare a link broken if several checks failed?
  • alter the page, perhaps simply add "[Broken link]"?
  • ...

davidw - If I had to implement it, I would do something like this:

1) Create a page which references all the pages with broken links - but don't include the broken links themselves, at least for now - I think it would be huge.

2) Do a weekly cron job that goes through each page and tests its external links, and updates the page with some kind of broken link tag. The original URL should not dissappear, because it might be useful in tracking down the correct URL, especially in cases where it's just a spelling mistake. This could even be monthly, as, after the initial run, its main purpose would be to turn up links that have gone stale in the meantime, which hopefully won't be happening every day.

3) Maybe each night check the pages that were changed that day, to make sure we're feeding good data into the wiki.

AK: Consider an image for the [Broken Link] tag in addition to the text broken link. Text for 'search', Image for visual search when looking at the page.

jcw - Ok, how about marking bad links thus: some text, a link http://blah.blah WikiDbImage badurl.png with some more text. With an "align=middle" and "alt=BADURL" tag, this would not exceed line heights, and be easy to search for? It can done for verbatim (fixed font) text too (where URLs are auto-converted to links as well), but I can't show that here.


LV After software has identified bad links, then what? Are there volunteers standing ready to look into the situation and resolve the bad links?

I've seen people propose - and in fact implement - deleting of bad links. Of course, that results in pages with basically empty content. And of course perhaps the link is only down the current week due to the person being on vacation, power outages, or someone hijacking their dns entry.

Or are we saying, in the above text, that the link checker's result would be to integrate the 'broken links' image beside the actual link? If so, then that would mean that many pages would change on a regular basis just from the link checker.

And the link checker needs to be smart enough not to change a link if it has already been changed for a broken link - but to unchange an indicated link if it returns.


SC Instead of making a change in the source, why not just make the change when the text is rendered either as HTML or whatever format. One way would be to add a class attribute to the link <a href="..." class="broken"> which the stylesheet could render differently. You could then add a note to each page encouraging folks to try to fix bad links. To do this the page renderer would need access to a db of bad links found by the web spider.


LV So this latest proposal is to do a link check of each wiki page when it is being converted into HTML. That will increase page rendering time. And of course, since the wiki works on a cache, that means that if I change a page in Feb, 2003, and the sites on the page happen not to be reached due to web traffic, etc., then anyone coming to that page in the months or years in the future will see the site marked as having a broken link, even if the problem was resolved moments later.


SC No, I was thinking that the link check could be done periodically and that the renderer would refer to the results of the check when rendering links. I presume that the results could be stored in an MK database and that looking up each link during rendering wouldn't take too long.

Interaction with the cache might be complicated I agree. If the link check ran once a month it could perhaps be run more often on only the bad links to guard against random traffic related problems. I'm not sure how the cache logic works but could cached pages be flagged if they contain bad links and then the cache invalidated if the link is found good?


Maybe this is not the best place, but I see things lowering the quality of our wiki visually and from content.

Issue Comment
DEAD LINKS I would like to see something like a todo-list, so that we all can work on that front together. A script should collect them.
Category transition The new Category format looks great in the footer, but this should be done by a script on the entire wiki. This should be easy to do.
DKF: Theoretically yes, practically not quite. There are other "category-like" entities to consider, and some footer links are not so useful when done that way. It's a mess. More useful would be a listing of uncategorised pages...
History shortcut I would like to have a shortcut in the footer besides History to get the latest diff with one click.
Recent changes If a page does not change in value, but only in format, spelling or other correction, there should be a different Recent changes for that. Basically I'm only interrested in changes to the content, not so much when it come to visual changes.

LV When I was younger, with lots of time that I was spending on the wiki, I used to dig through the wiki pages to generate lists of pages with what appeared to be no categories, pages which were not referenced, etc. These days, I find myself with less and less free time, as well as internet access issues.

Basically, I started out with the web site's database, extracted the Pages, deleted all the empty pages, went grepping through the remaining files generating a list of files with the string [Ccategory, and then compared that list with the full list. Any file remaining was logged as a potential page to be categorized.

It doesn't mean that finishing off that list gives us a fully categorized wiki... for one thing, people keep adding new pages categorized each day.

But we would be closer.

Lars H: One thing that could be done it to change the empty page template to contain a link to, say, [Category Uncategorised]. This would make that the category of all (new) pages that haven't been categorised yet, and such pages could be found as those in any other category.