Alternative Data

Alternative Data

With newer technologies being introduced more frequently, AMCs and institutional investment firms are trying to get hold of it earlier than the competition. Companies are searching for a way to improve their investment alpha, diversify existing investments and reduce the costs and risks of investment. Consequently, companies ranging from mammoth hedge funds to boutique investment banks have started making use of data gathered from alternative information sources like geolocation, credit card transactions and email receipts. Useful insights are derived from this data by using various techniques like machine learning and data analytics. These insights are then used to make necessary changes in trading and investment strategies.

This white paper is dedicated to providing a detailed analysis on why and how alternative data is gathered and implemented by companies to make better investment decisions.

The Alternative Data space

We all know the benefits that big data analytics provides. The ever increasing use of big data analytics by new and existing companies for better results clearly shows its efficiency. But it also points out the question that "If every individual and company makes use of big data, then how does a company get ahead of the competition?". This is where alternative data comes to the rescue.

As Krishna Nathan, CIO of S&P Global defines it, "Alternative data draws from non-traditional data sources so that when you apply analytics to the data, they yield additional insights that complement the information you receive from traditional sources". Sources of alternative data include credit card transactions, geolocation, public sources and so on.

The rise of the Alternative Data world

The global market share of alternative data was valued at $1.06 billion in the year 2019 and estimations say that it will see a compounded annual growth rate (CAGR) of 40.1% from 2020 to 2027. A major driver for this huge growth is the large number of alternative data types that have become available in the market since the last decade. The most commonly used sources include financial receipts, transactions and web scraping. Some other new and emerging sources that have recently gained popularity include satellite images, sensors, mobile data, social media and IoT enabled device. The boost in the alt-data growth can be attributed to the rising demand in the hedge fund industry. Nowadays, majority of hedge funds managers are using alternative data to gain competitive advantage by outperforming the peers and supporting the risk management process.

For many years, alternative data was just considered an additional tool which was merely used to supplement the fundamental data. Companies sent interns and staff physically to get the data about the rush and headcounts in a shopping mall to get an idea of its quarterly revenue. But the landscape has changed now.

The alternative data providers' landscape has also seen a continuous and steep rise with the current number being 450+. As per EY global HF and Investor Survey, 78% of funds and investors use or expect to use alternative data. IBM has also stated that 90% of the data created and circulated in the last two years is alternative data.

“I think in three to five years, everyone will be using alternative data,” said George Goldman, Vice President and Head of Finance Sales at Dataminr in 2016. “It’s not an if, it’s a when do people figure out how to incorporate this into their investment processes. If you’re not at least thinking about it now, you’re going to be left behind.”

These are some highly prominent players in the alternative data market:

  • Dataminr
  • Earnest Research
  • M Science
  • RavenPack
  • 1010Data
  • UBS Evidence Lab

Companies like Quandl, back in 2012, emerged and ran their operations solely towards providing alternative datasets and insights. Unlike today, they were not having a good time. Now, the company has become NASDAQ's major source for alternative data. It builds investment models by identifying datasets of local firms and has partnered with several insurance companies throughout U.S. Through this partnership, Quandl gets quick access to data like insurance policy information on new car purchases. It categorizes vehicle sales data on the basis of vehicle model, manufacturer and region and buyer demographics. These insights allow investors to put efforts towards increasing profitability of their investment.

The alternative data industry has proven itself to be a satisfying industry for buy-side employees too. The number of alternative data full time employees has increased by 450% in the last 5-7 years.

Types of Alternative Data

Examples and types of alternative data include:

Industry Insights

Information about adoption of alternative data by different industries can help in predicting how well the particular industry will prosper. According to the research conducted by in the article "Alternative Data Market Size, Share & Trends Analysis Report By Data Type on Banking", it stated major adoption trends of alternative data in industries namely Financial Services and Insurance (BFSI), retail, and IT and telecommunication industry;

  • The industry that has been flourishing the most since past couple of decades is Banking, Financial Services and Insurance industry. It accounted for 15% of the global revenue in 2019.  A major part of this growth can be attributed to the early adoption of alternative data by the industry for investment decisions and portfolio constructions.
  • The BFSI industry is followed by retail industry. It is predicted to grow at a Compounded Annual Growth Rate (CAGR) of 42% from 2020 to 2027. Some most prospering segments of this industry include real estate, transportation and energy sectors. Alternative data played a major role in this. For example, the e-commerce and transportation sector has been flourishing by using web scraping and social media (types of alternative data) to understand what kind of products customer like and hence stimulate the target customers to buy the products.
  • Other industries benefiting from application of alternative data include IT and telecommunication, Media,

Most used data types' insights

One of the oldest and most used segment, credit and debit card transactions, has been leading the alt-data market since its emergence and accounted for 14% of the global alt-data revenue in 2019. The substantial share can be attributed to these two factors; high demand for the data among investors and managers and the presence of a number of providers for the data. This category is expected to occupy a large share in the estimated alt-data growth because of its useful features such as sorting customer expenditure data on the basis of gender, age, seller, geography and so on. Companies are combining different types of data with transaction data to extract hidden insights on consumer expenditure patterns, thereby enabling investors to invest in profitable businesses.

The use of geolocation data from satellites is also gaining popularity as it indicates the foot traffic of a customer store on a particular day or time. This data, when looked at in conjunction with other data like credit card transactions, is then used as a key input in determining the hidden value and how well the store is using strategies for its operation.

Another category is social media and sentiment data. This data will also show significant growth due to the rising demand of mobile and smartphones in the market which will result in increased demand for smartphone usage data among investors. The retail companies use this data to understand the user's e-commerce applications' usage behavior. It is also used to understand the interest areas of various groups and regions.

Regional Insights

Looking at the alternative data adoption list according to geographical regions, North America sits on the top of the list. It's huge share in global alt-data market has contributed a lot to its fast growth of economy.

The research conducted by on alternative data states the adoption trends by North America and Asia Pacific as follows:

  • "North America dominated the market and accounted for over 33% share of global revenue in 2019 and is anticipated to maintain its dominance over the forecast period. The high share of the region is attributed to the presence of numerous players in the market, such as Advan, Dataminr, Eagle Alpha, M Science, and UBS Evidence Lab. The early adoption of alt-data from different industry verticals in the country also results in a high market share. Currently, more than 70% of the asset managers in the U.S. are inclined towards the use of non-traditional data such as alternative data in their investment process."
  • "The Asia Pacific region is expected to emerge as the fastest-growing regional market because of the increasing use of data-driven research by the investors. The regional market is anticipated to open significant growth opportunities for companies from emerging economies, such as India, Singapore, Thailand, and China."

Alternative data performance

Five most popular data types: Social / Sentiment Data, Private Company Data, Credit Card Data, Supply chain data, and Web data, as per EY Global HF and Investor Survey.

Funds using Dataset

How Alternative Data adds value: Notable cases

Alternative data is setting new highs every year in terms of adoption by companies. The primary drivers of adoption include:

1. Competitive dynamics / edge:

It is pretty common for asset managers and wealth managers to have access to traditional data and information through traditional information channels. Innovative asset managers are seeking an upper-hand through non traditional alternative data. The adoption of alternative data by total US hedge fund managers has already reached 78%. A study was conducted by EY in 2017 and its report stated, "A large quantity of managers see effective use of data and analytics as a key competitive advantage for the future. Smaller managers moved first but managers of all statures and strategies are now experimenting with big data analytics and AI".

2. Growing evidence of alpha in alternative data:

A 2017 Greenwich Associates survey of asset managers highlighted that 90% of asset managers that are using alternative data are seeing a hike in return on investment (ROI). Importance of alpha provided by implementation of alternative data can be understood by the following facts:

  • A majority of asset managers have been open with the press about their alpha returns.
  • A couple of alternative datasets are successful more than the rest because their providers are big firms, for example Dataffirm.
  • More renewals: Aggregators of alternative datasets are seeing increasing number of renewal licenses.

3. Providers of AUM expect to increasingly allocate their assets to managers who are utilizing alternative data:

It is quite apparent that investors and funds are trusting their wealth and assets only with those asset managers who are utilizing alternative data one way or the other. An EY report states that, "Given the developments in fintech and excitement surrounding the technological capabilities to rapidly analyze different datasets, it is not surprising that investors are expecting an increased percentage of their hedge fund managers to be using non-traditional data and new analytics in their investment process."

Looking at the statement, it can be concluded that investors view these advancements as an opportunity for managers and those who are able to effectively utilize these capabilities have a distinct advantage compared to the peers who do not do so.

4. Risk of being at a strategic disadvantage in the medium to long term:

It is hard to find a company in the related field that is not using alternative data in at least one way. It is certain that firms that neglect the importance of alternative data in their strategy formulation are at a strategic disadvantage. Everyday, more and more companies are applying alternative data, so their peers, who are just following traditional approaches are bound to fail in the long run. Companies ranging from government owned universities like New York University and Federal Reserve Banks to private mammoth firms like Amazon and Google are using alternative data.

Alternative data for private investing/risk management

Data for private investing: Private companies are not allowed to disclose much of their data. Hence, it is an opaque market. So, what can venture capitals and private investors do? Trying to proxy the private company, using proxies for performances such as hiring, consumer activity, etc. which can be tracked with alternative data.

Newswires: Traditionally, newswire datasets (Bloomberg news, refinitiv, ravenpack) make decisions about trading publicly tradable assets. News volume is used as a risk management tool and detecting abnormal news flow. Newswires are also used to track market sentiment regarding larger private firms, before IPO e.g. Uber IPO.

Social Media: Twitter offers full speed access to

a) History that goes back as far as 10 years.

b) Present and past market sentiment of brands.

c) Information about specific keyword counts

Web Scraped Data: Companies like Thinknum (Alternative Data provider) use web scraping to gather statistics about firms in a structured manner. Companies on a large scale can be covered by this method. It is used to cover specific information of a company like job hiring, store locations, LinkedIn details, web traffic of the firm.

Best Implementation and industry practices

Its clear that all forms and formats of alternative data hold great value for wealth managers that are looking for new opportunities of investment, whether a short term trading advantage or a long term portfolio strategy. Another thing that is clear is the collection of appropriate data, its integration, utilization and finally converting it into a strategy which can be used for investment decisions is not a child's play.

Before finalizing any deal related to alternative data, certain legal aspects are needed to be checked:

Is the data allowed to be sold? This must be checked by looking if there is any GDPR or consent issue.

Has the personal data been personally scrubbed.

Is the data needed to be aggregated/blurred before being sold.

It is very important for the seller, as well as, the buyer to be aware of the legal aspects. Investigation must be done beforehand.

Are there any issues for exclusive datasets?

How Alternative Data is collected

Web Directories

One of the most convenient and probably pocket friendly source is using web directories. Many a times, you can find whole datasets listed on websites. For example, provides datasets in an organized and direct manner without leaving scope for hassle for the visitor.

Data firms which aggregate alternative data

There are several websites and companies which provide alternative data for investors e.g. Bloomberg, Quandl, Eagle Alpha and so on. When sources start falling short, alternative data aggregators also conduct deals with each other to share data. One typical possibility in such case is taking revenue share from underlying supplier.

Direct/raw source of data

This source requires most work as it requires the investor or manager of funds to contact corporate companies directly, either online or physical meeting. Several companies refuse to provide any data and hence, the work can be challenging, hectic and time consuming.

Data strategists/scouts:

  • Within funds, there are data strategists who

i)  Search for datasets

ii) Act as bridge between external data firms, and internal portfolio managers and data scientists.

  • External data scouts

i)  These scouts do the collection work in place of data scientists.

ii) They act as intermediary between data users and data firms.

iii)These scouts are paid not by the data firm but the buy-side firm.

Own data:

Every organization has internal data. Financial organizations are no different, especially sell side firms. The difficult part is that the data isn't catalogued in an organized manner. Not every team is aware of the potential alternative datasets hidden within their department.

  • This can be solved by creating a web directory for datasets as a start, to allow better browsing and providing clarity.
  • Centralization of data sourcing is also a solution. It can lead to better negotiation of deals with data vendors, in this case- team by team. It can also help in keeping better track on data subscription and unnecessary duplication of data.

Choosing the appropriate Alternative Dataset

Looking at multiple datasets in conjunction is better than making any decision by looking at only one. For example, an increased foot traffic in a mall will not result in increased sale if the the credit card transaction receipts of the same show a declining graph.

There are a number of different criteria we need to apply to a dataset before we can do any kind of shortlisting and further analysis. There are simply too many datasets to test, hence, we need to have some sort of criteria for shortlisting the ones we’d like to look at first.

For example, a company related to quants and stock trading will be looking at datasets which can be used to predict stock prices and hence, execute trade accordingly. By contrast, a discretionary company will be looking for data related to a smaller number of assets and will be digging down even more into each one of them to bring out more details.

Industry practices and implementation

Investment management firms seeking to exploit the opportunities alternative data may present, however, can take proactive steps to ensure they get return on their investment in time and resources as they approach new data sets.

Data gathering and assessment practice

The first step in general industry practice is to understand what kind of data is needed to fulfill the operations needs. Next comes the procurement of data which is not as straightforward as it may seem. Investment and hedge fund managers have to find out the data by contacting data suppliers personally which surely is a very tedious task. What is actually needed is undiscovered data which can provide an edge.

The company or origin source of the data may not be aware of the demand for their data. It’s also improbable that the originator has ‘productized’ the data set to make it consumable by others. For the investment firm, this means that in order to determine whether it can be put to the use intended, they will need to employ data science processes to determine the accuracy and completeness of the data to assess its value.

According to the Alternative Data white paper by,

  • "It’s practicable to identify five or 10 prospect sources, and apply the data to models to test their usefulness. These testing models should analyze how the data will be used and whether the actual data is fit for this purpose. This may include analysis of underlying code where appropriate, back-testing against models and applications that will use the data set, and talking to internal users about the potential value. While there are tools now available to help in this process, it remains a time-consuming task."

Implementation and integration practice

Since alternative data was not a very popular source in the early tech revolution days, there wasn't sufficient technology for its utilization back then. Hence, the early adopters of alternative data had to spend a huge chunk of their money on building their own platforms to analyze it. Some other early adopters turned to big data tech like Hadoop and Spark servers for integrating various data types. Companies routinely collected less eye catching but useful data like telephone records, website requests, forms and cookies and other types of non traditional data to use them as input for trading models.

As per the data mentioned in the Alternative Data white paper by

  • "For investment managers, the Holy Grail is to be able to receive non-traditional data services in a standardized format via an industry-accepted data platform or API, making it available through the same mechanisms used to consume market prices, earnings estimates and ratings information. To achieve this, data teams at financial institutions are making use of analytics languages like R and Python, as well as tools like Tableau, Excel and Mathworks’ MatLab to create a professional consumption experience and simplify the life of the analyst or other user that has to deal with the data. The goal is data that looks and feels like data consumed via traditional means, and that is documented and supported by the supplier."

Inexperienced data sources

There is a great chance that the alternative data provider is inexperienced, which can lead to a different set of problems. Investment firms and managers need to ensure that data is received from reliable and trustworthy sources, particularly when the data is of key value for trading decisions. In short, the data needs to be there when it’s needed. Similarly, the data needs to be documented for both use and administration.

Given the inexperience of many alternative data suppliers, investment firms may need to build their own teams and infrastructure to deal with the gaps in experience or resource among its data providers. Alternatively, one can turn to emerging middle men who can help ensure reliability and availability of service.

Data Quality

The ongoing issue of data quality still remains even if you are successfully able to find, utilize and integrate the data with internal systems and made it consumable. To ensure that the data quality is sufficient in terms of completeness, accuracy and is fit for the purpose, some level of due diligence is required. It’s also of key importance that data hygiene of the originator is checked, by reviewing time-stamp, eradication of selection bias and checking data validation.

Third-party integrators

Keeping in mind the uncertainties that come along with a dataset provided by inexperienced or untrustworthy sources, sell side firms act as intermediaries and help to connect the buyers and seller of alternative data. They help the data originators in the packaging and delivery process of dataset to the buy-side client. This also leads to a betterment of relationship among the three.

The Alternative Data white paper by has briefly explained the scenario of third party integrators as below:

  • "In summary, some investment firms are equipped to source and integrate these data sets themselves, while others want to buy product solutions. Whichever route they decide to take, the reality is that firms are better equipped than they were just a couple of years ago, as new GUI tools, reporting systems and assessment groups emerge in the marketplace. Getting data in shape for consumption takes hard work. Data sets may need an additional source to make it useful, other data may need to be anonymized before it can be consumed. Some data is unverifiable. But data suppliers and consumers and the middlemen that serve them are going to great lengths to ensure the new alternative data sets are reliable and accurate. "

Risks to look out for

Collection and successful integration of alternative data is simply not enough if the data is gathered from a non-public source or is not complying with the stated rules. There are four types of risks associated with collection and implementation of alternative data. These risks are:

Data risk

Gathering of Alternative data mostly comes hand in hand with greater risk than traditional data gathering. If the data gatherer is unskilled or immature, chances are the person will gather data which the company is either not allowed to use or won't use willingly. Some of these data risks include:

Data provenance risk: This risk is related to the unlawful and in some cases, illegal collection and retrieval of data.

Accuracy or validity risk: This risk is related to the situation in which the data gathered turns out to be unreliable or inaccurate.

Privacy risk: This risk is related to the situation in which personal data (Personally Identifiable Information) of companies is used without appropriate permission.

Material nonpublic information (MNPI) risk: This risk is related to the gathering of data from non-public sources which can lead to unforeseen circumstances.

Model risk

Most investment managers seem to put more of their focus and efforts on fundamental and quantitative data while forming investment models. In such a situation, when a potential, reliable and accurate alternative data comes to the attention, it generally impacts the investment model in a drastic manner and hence, a new investment model has to be formed taking into account the new data. This risk of change in strategy due to new data input is known as model risk.

According to the alternative data white paper by **Deloitte, "**In addition to the risks above, model risk includes risk that the alternative data may be incorporated in the model incorrectly, that the trading signal generated may be irregular or inconsistent under certain conditions, and that the output of the model could be improperly linked to the trading process. Strong controls around risk overall can serve to mitigate these alternative data-related risks"

Regulatory risk

There are still no concrete laws on the acceptable practices on gathering, collection and use of alternative data. But every firm should try to work under the laws that have already been stated by a regulatory body. Regulatory risk can be even better understood with this example. Imagine a company hires an immature and unskilled data researcher. He manages to find an alternative data source which provides data that helps to accurate predict a company's future sales. It would be great to continuously use that data year on year to make investment decisions and generate profits. But such a situation calls for a thorough research on the data gathered in terms of its copyrights, confidentiality, terms of use and missteps that might have been undertaken by the researcher.

Talent risk

In order to keep up with competitive edge of the ever improving world, there also resides a risk of talent or creative approach. The world is changing every new day. Fund managers need to formulate creative and effective strategies on a regular basis to improve and maintain their investment alpha. A firm that follows traditional strategies and approaches, is not up-to-dated with the new technology flow and is not able to utilize, collect and implement appropriate alternative data is bound to fail in the long run.

The future of Alternative Data

If it was needed to sum things up in one sentence, it would be:

"Adoption Will Continue"

Looking at the aforementioned data sources and facts, it is certain that we will see the number of new alternative data providers as well as consumers in abundance. Data providers will seek to outplay their competition and hence, will introduce new and innovative techniques for delivering better data quality. The focus on value will lead to consolidation among market players.

As for the availability of data, the fact that "90% of the data available today was generated in the last 2 years and its supply is estimated to grow by 2.5 quintillion bytes per day" says it all. It is speaking loud and clear that there won't be any lack of data availability. Instead, this abundance of data will probably act as a hurdle for managers and research specialists as they will have to spend more time of their day on figuring out how to capture appropriate data, integrate it and derive sufficient value from it.

Many theorists believe that there are still more insights left to be derived than what is actually derived from the currently available alternative data sources. To realize the full potential of alternative data, fund firms need to have the technology infrastructure and staff resources to aggregate, store, synthesize, analyze and back-test the data.

Looking at the size of organizations in relation to adoption, large established firms will surely have enough funds at their disposal for investment in alternative data as a core process, AI powered analytics and hiring data scientists for insights extraction. Small and mid-size players will probably face difficulty in doing so.

Chances are firms will catch up conveniently by working with a technology provider who has already invested in both professional expertise and latest technology to support the alternative data initiative. Fund managers can increase in size by outsourcing alternative data aggregation, management and analysis – everything except the investment process itself, which is proprietary to each firm.


Doug Dannemiller, Alternative data for investment decisions, Deloitte.

SS&C tech, The future for the alternative investment industry, AIMA.

Alternative Data Use cases, Eagle Alpha.

Alternative Data: Application and best practices for investment management firms, TradingTech Insight.

Anusha Sivaramakrishnan, Alternative Data in investment management: A perspective, Tata Consultancy Services.

Saeed Amen, Alternative data for investors, Cuemacro.

Alternative Data in Financial Markets, Dataiku.“Alternative data draws from non,Nathan%2C CIO of S%26P Global data (Alt-data),press releases%2C and management presentations.&text=As such%2C AI-based data,growth over the forecast period


( )


(,40.1% from 2020 to 2027.&text=As such%2C the companies are,data from all such sources)