Introduction
Today we attempt to compare three health website publishers that focus on generating content about health, illness and in general provide information about medical situations: Healthline, WebMD and Healthily.
Traditionally when comparing websites publishers, there is a tendency to use metrics like SEO performance and site visits. But these metrics are indicators of popularity, which is not necessarily a measure of truthfulness.
Instead, we want to compare how well-researched these publishers are. The hope here being that if a publisher generally has better researched articles, then there is a higher chance that the publisher’s word is more trustworthy.
A Different Measure: Efficacy
When examing how well-researched an article is, in a sense what we are interested in measuring is something called efficacy, which, according to the Cambridge Dictionary, is “the ability, especially of medicine or a method of achieving something, to produce the intended result”.
Applying the definition of efficacy in our context, we are looking at the ability of a publisher’s articles to solve real world problems effectively.
Measuring efficacy is naturally tough, and there are several possible ways we can attempt this. For example we can look at anecdotal stories of how people have responded to recommendations from a publisher’s articles. Or we could do a comprehensive study of a publisher’s articles and compare these against existing scientific research. Unfortunately, the aforementioned approaches are involved and laborious activities.
Fortunately for us there is a simpler approach: by looking at the sources these publishers tend to cite. This is what we will attempt to do today, using a small sample size and aggregated data from our homegrown tool Quackcheck.
As we will see, this approach is less comprehensive, but nonetheless extremely insightful.
The Experiment
In this study, what we have done is to take all reference (outbound) links from 20 random articles from each publisher. These articles are generally retrieved from the frontpage and first-level subtopics from each site. We run our link summarizer via the Quackcheck chatbot, to get a sense of the type of links that are referenced.
We then examine the data acquired through the following lenses:
- How many references on average does each site have per article?
- Are there any sites that tend to self-cite?
- How diverse are the site’s references?
Results
How many references on average does each site have per article?
We care about this because of the assumption that articles referencing more sources should likely be better researched, because the authors are able to draw on diverse viewpoints.
On examining the links provided by each site on aggregate, we get an interesting result:
Publisher | Reference links per article |
---|---|
Healthline | 26.4 |
WebMD | 12.1 |
Healthily | 0.7 |
We can see that Healthline has the most references per article, while on the flipside Healthily’s articles are barely supported by any references.
Are there any sites that tend to self-cite?
With this question we try to answer the question of how much quality there is in an article’s references?
Short of diving into each reference and understanding each source in-depth, we have found an interesting way to gauge the quality of the linkages: does a site tend to refer heavily to pages on their own site than external authoritative sources?
The reason why this question is important is because of the phenomena of self-citing. Referencing and links are important, but we also want to know whether the article we are reading is part of a citation farm (not cool), or a means of extreme self-promotion (also not cool). Also if references inadvertently come from the same source, we have to be concerned about the potential for circular reasoning and dependencies in a cycle of self-asserting premises (worryingly uncool).
So what does the data tell us?
Publisher | Self-citations (%) |
---|---|
Healthline | 50.38% |
WebMD | 82.64% |
Healthily | 100.00% |
We see that there are heavy components of self-citing in each publisher, but some – WebMD, Healthily – are much more guilty of this than others.
Which site tends to be more diverse in their linkages, and tries to get a more varied set of sources
The diversity of links from varied sources can be a sign that the author of an article is widely read in the article’s subject. To have a sense of the diversity of linkages, we can graph the domain counts using a simple barchart.
Because we know that the articles examined from Healthily are 100% self-citing, we can exclude them from this measure (because there isn’t any variance). But what about the remaining 2?
In the above graph of the Healthline sample, we see a large bar at the start which represents the self-citations, but also a series of progressively smaller bars pointing at a good diversity of other sources.
The WebMD counts are smaller as we saw from the first section, but also see that this one is dominated by the single large bar from self-citations, along with a series of much smaller bars for a varying set of other sources.
A more quantitative measure
Based on the above we can see that Healthline probably has a better variance in sources that they generally draw from. But can we compare them in a clearer, more quantitative manner?
The answer seems to be: Yes, we can.
What we figured out is that we can augment something called Shannon’s Entropy to suit our purposes. This measure involves a bunch of statistical voodoo, but in short, we can say that the higher the value of the measure, the higher the diversity of the sources. So a higher value is a good thing. In our setup we use a modified version of this algorithm, but if you want to know more about the original algorithm, I found this particular piece to be particular helpful in understanding it.
Publisher | Shannon’s Entropy |
---|---|
Healthline | 6.21 |
WebMD | 5.45 |
Healthily | 0.0 |
From the above measure we can verify that Healthline does cite a more diverse set of sources in general compared to the other two.
Conclusion
Based on the results of our simple experiment, we can say that in general Healthline articles are better researched.
They have more references, tend to cite from non-Healthline sources, and have a higher diversity of citation sources than WebMD or Healthily. In comparison, WebMD is a relatively okay runner-up, while Healthily fails completely on each front.
If you want to acquire the power to do your own reference checking, have a look at Quackcheck, a tool built to give you the power to quickly verify how well-researched an article really is.