NEW YORK - At Integrity, our database of thousands of research
providers and industry experts provides the foundation for our
analysis. However, on occasion, people will wonder why we go to all the
trouble of collecting, organizing, and analyzing this information when
one can easily search for sources of information using Google. At a
panel discussion on specialized search tools during the recent Investorside conference, Penny Herscher, CEO of FirstRain, noted that she has also had to overcome similar skepticism.
Given that Google is, in all likelihood, the primary
information-finding tool in the rest of our lives, it is perfectly
understandable that analysts might turn to it first when doing
investment research. But Google is often insufficient to find anything
of value for financial research.
The most common problem cited with Google is that its broad coverage
universe tends to turn up irrelevant hits. This is perhaps the easiest
problem to remedy: one can learn to craft search queries more
carefully, by using fewer ambiguous words; it is also possible to
instruct Google to exclude sites that are most likely to be sources of
noise - for example, adding "-wikipedia" to your search string will
remove Wikipedia pages from your search results, while the CustomizeGoogle extension for Firefox
lets you exclude a range of sites from all of your searches. A related
complaint is that Google searches will exclude information contained
behind password-protected subscriber walls. But this is unfair. One can
hardly blame Google for failing to violate the intellectual property
rights of content owners.
The more serious reason why Google is insufficient for financial
research has to do with its sorting algorithm. As an experiment, I
recently created a custom Google search engine,
using a list of several thousand web sites from our internal database,
sites that we had judged to be sources of valuable research and
information. By doing this, we effectively eliminated the complaint
that Google's broad search universe would turn up irrelevant hits. One
would have expected that allowing Google to, essentially, search the
same network of sources as is present in our internal database would
have yielded excellent results. This list included a large number of
unique sources of in-depth research and analysis that we have collected
over a number of years. As it happens, it also included a much smaller
number of sites that are pretty widely known - bulge-bracket investment
banks and research sources that have a more retail focus.
Conducting a Google Search on this restricted universe revealed the
real problem with Google. The search results were ranked in an order
that had nothing whatsoever to do with the depth of analysis or quality
of information present on the sites. The ranking was based purely on
popularity. Here is Google's description of how this PageRank algorithm works:
Democracy on the web works.
Google works because it relies on the millions of individuals
posting websites to determine which other sites offer content of value.
Instead of relying on a group of editors or solely on the frequency
with which certain terms appear, Google ranks every web page using a
breakthrough technique called PageRank™. PageRank evaluates all of the
sites linking to a web page and assigns them a value, based in part on
the sites linking to them. By analyzing the full structure of the web,
Google is able to determine which sites have been "voted" the best
sources of information by those most interested in the information they
offer. This technique actually improves as the web gets bigger, as each
new site is another point of information and another vote to be counted.
These democratic principles work fine for consumer needs. But, in
our test searches, the small number of widely known, retail-oriented
research sources completely dominated the search results, while
thousands of less well-known sites with deep information were nowhere
to be found in the first few results pages. PageRank is an algorithm
that assumes that the more widely-known and popular something is, the
more valuable it must be. This, of course, is the exact opposite of how
institutional investors think about information - as something becomes
more widely known, it becomes increasingly less likely to yield any
sort of alpha... in fact, I believe I am on safe ground when I say that
there will never be any kind of actionable and profitable intelligence
on the front page of Yahoo! Finance.
Institutional investors need to be elitist and anti-democratic when
searching for information. The sources of most value to them are very
rarely going to do well on Google's PageRank algorithm. This is why
alternative search engines and proprietary databases are necessary if
one is to find any kind of informational advantage.
Posted at 12:51 pm by Ronit Bhattacharyya
Permalink