Data Deep Dive: Open Source Contributor Index (OSCI) 2016-2020 Analysis

If you enjoy analyzing the monthly Open Source Contributor Index (OSCI) rankings , then you’re going to love our deep dive into the 2016-2020 historical data we’ve pulled from GitHub on which commercial organizations contribute the most to open source. After looking at this data, it’s clear that open source continues to be increasingly important over time and the open source community only continues to grow, as the number of Active Contributors among the top 30 companies in the OSCI has increased by 75% since 2016.

UPDATE: This blog is now updated with analysis for Q1 2020. We keep the original analysis to end of 2019 and supplement with additional commentary on the trends visible in the first part of 2020.

Introduction

To get started in our findings, let’s break down our methodology that led us to these conclusions:

  • Timeframe: January 2016 through March 2020
  • Logic: To assess the number of people from commercial organizations who contributed to open source projects from 2016-2020, we utilized the same approach that we do for the OSCI rankings. We use publicly available GitHub commit event data from GH Archive. This includes all commits to public GitHub projects, which generally correlates to them being open source. For this analysis, we measured the total number of Active Contributors (10+ commits) at each organization.
  • Tiers: To make it easier to analyze trends in a meaningful way, we split the rankings into four tiers. The tiers are not based on a fixed number of companies for 2016 vs. 2020, but rather break up different parts of the data to gain better insights into the open source community. More on this as we blog our way through the rankings!
  • Result: We graphed the rankings of these companies over the last four years, so that we can see trends in the open source community and who retains their positions, rises and falls over time.

High-Level Rankings (2016-2019)

To start, let’s look at all the companies who make it into the top 30 at some point in the past three years. Interestingly, 38 total companies made an appearance, which shows how competitive these rankings truly are. Note that this graph does not show quantity of Active Contributors – just ranking over time. Despite the density of this graph, we can see the relative stability in the companies who ranked highly compared with the volatility of the companies that ranked farther down the list. Let’s break down this data into tiers to better understand what trends we can observe.

Q1 2020 Update

This table compares positions at end of March 2020 with positions at end of 2019.

In this table, we observe some notable movements:

  • Google has risen from 2nd to 1st place. As shown in the 2016-2019 graph below, Microsoft was consistently in 1st place over the last three years. Despite this, there is a small gap between the two companies in the number of Active Contributors so far in 2020, so positions can still change again as we continue through 2020.
  • JetBrains made a significant rise from 21st place to 13th. However our historical data for 2019 shows that at end of Q1 2019, JetBrains was in 14th place, before dropping back seven places by year end.
  • Four new companies appear in the top 30 for the first time: Odoo (up 18 places to 17th), Elastic (up 11 places to 25th), Datadoghq (also up 11 places to 28th) and mongoDB (up from 41st to 30th). In fact, we already called these companies out as ‘rising stars’ in our original analysis – companies who had made consistent rises in the rankings over the past years.
  • The companies displaced from the top 30 were WIX (from 29th to 36th), Liferay (28th to 33rd), Tencent (24th to 37th) and Alibaba (18th to 31st).

Finally, we must note that it is early in the year. Smaller companies with a dedicated focus to open source tend to appear at the top of the rankings early in the year, then drop back as they are overtaken by larger corporations who invest continuously in open source over the course of the year.

Tier 1 Analysis (2016-2019)

Tier 1 is comprised of the top five companies on the list, each having significantly more than 1,000 Active Contributors from 2016 to 2019. Despite an overall uptick in the quantity of Active Contributors, Tier 1 rankings haven’t changed at all since 2016. Microsoft (with a 97% increase in Active Contributors) and Google (99%) have the most notable increases, emphasizing their position at the top of the ranking, while IBM shows the slowest, yet still very significant, growth at 50%. The gap between the second and third ranked companies (Google and Red Hat) has grown over time, unlike the gap between the fifth and sixth companies (Intel and Facebook), which has remained consistent in the last four years.

Tier 2 Analysis (2016-2019)

Tier 2 is comprised of the next set of companies and features major IT providers and open source leaders. Amazon had the most significant rise in position, with a major increase in Active Contributors of 421%. SAP (161%) and Facebook (132%) also had dramatic rises in position. Like Amazon, Oracle moved up from Tier 3 to Tier 2 from 2016 to 2019 with an increase of 130%. Companies who stayed in this tier from 2016 to 2019 increased their Active Contributors by an average of 90% over time, which equates to an annual 23.8% growth rate. We can see that Tier 2 still shows relatively low volatility, as the majority of companies in this tier in 2016 remained in 2019 (with the exception of Samsung, Cisco and Thoughtworks).

Tier 3 Analysis (2016-2019)

Tier 3 is comprised of the next 10 companies. The major growth companies included Nvidia (216% increase), Adobe (168% increase), ARM (161% increase) and EPAM (117% increase). Samsung and Cisco both dropped from Tier 2 to Tier 3 over time by a 38% and a 3% decrease, respectively. Companies who stayed in this tier from 2016 to 2019 increased their Active Contributors by an average of 89% over time,with an annual 23.7% growth rate. In this tier, we see large volatility – only half of the companies in Tier 3 in 2016 have remained in 2019.

Long Tail Analysis (2016-2019)

Following the Tier 3 is what we refer to as the Long Tail. This starts at 24th place and continues to the 30th position and is reflected by the much larger fluctuations in this diagram compared to the previous tiers.

Making Moves: Key Risers & Fallers (2016-2019)

As we’ve shown in the above analysis, it’s clear that a significant increase in contributions to open source projects was necessary for a company to keep its position in the top 30.

As mentioned previously, Amazon (421%) had the most significant increase in the number of Active Contributors. Tencent (291%), WIX (233%), Nvidia (216%), AMD (187%), Adobe (168%), SAP (161%) and ARM (161%) saw major growth over time. Strong growth was also noted by Facebook (132%), Oracle (132%), GitHub (123%), EPAM (117%) and JetBrains (111%).

While there were 10 companies who dropped in ranking, only five companies saw a reduction in the number of Active Contributors – Samsung, Cisco Systems, ThoughtWorks, Andela and WSO2. This means that five companies dropped in ranking significantly despite increasing their number of Active Contributors, which highlights that it is not enough to simply have a stable Active Contributors community – instead companies need to grow the base of Active Contributors significantly to maintain their position.

Rising Stars Beyond the Top 30 (2016-2019)

We’d be remiss if we didn’t note the incredible growth that we’ve seen from other companies in their Active Contributors since 2016. While these companies did not rank in the top 30 at the end of 2019, we can expect their position to change, and we’ll likely see a few of them enter the ranking, over the next year. If you are interested in learning more about our methodology or speaking with us, please contact OSCI@epam.com

Patrick Stephens joined EPAM Systems in 2001. In his many years at EPAM, Patrick has worked from various locations, including the UK, US, Spain, Malaysia and Ireland, to deliver projects for some of our biggest clients. His interest in free and open source software (FOSS) and the development of OSCI grew out of seeing how much open source activity is done at EPAM and recognizing the potential to support it.

You may also be interested