欢迎访问生活随笔!

生活随笔

您现在的位置是:首页 > 形式科学 > 统计学 > Hadoop

Hadoop

Hadoop正在走下坡路

发布时间:2022-06-09Hadoop 统计学
长期以来,Hadoop 这个词铺天盖地,几乎成了大数据的代名词。三年之前,提起超越 Hadoop 这件事,似乎还显得难以想象。但三年后的今天,这一情况发生了一些改变。

作者:George Hill ,他是知名商业媒体 Innovation Enterprise 的主编,同时也是 The Cyclist 公司的联合创始人。本文由可译网toypipi , 中山狼 , 薯片番茄, 班纳睿翻译。

长期以来,Hadoop 这个词铺天盖地,几乎成了大数据的代名词。三年之前,提起超越 Hadoop 这件事,似乎还显得难以想象。但三年后的今天,这一情况发生了一些改变。

早在 2012 年,知名媒体 SiliconANGLE 就针对 Twitter 平台上的大数据专业人士做了一项调查。调查结果显示:这些专业人士日常谈论 NoSQL 等技术(如 MongoDB)的次数要远多于 Hadoop。这表明,至少在数据科学家的群体中,用 Hadoop 代指大数据似乎并不准确

然而大多数人认为 Hadoop 已经是大数据最重要的技术之一,是大数据构建的基础。它还被利用在一些新的领域,如仓储系统。话虽如此,出人意料的是,它的适用性或多或少有点滞后。对此,IBM Software 的传道士 James Kobielus 说道:“ 2016 年,Hadoop 在大数据领域的下滑速度比我预期的要快得多。”

其中原因很难说清,但可将其理解为数据领域的惯有现象。Gartner 于 2015 年的调查显示,54% 的公司都没有计划投资 Hadoop,另外 44% 的公司表示已使用 Hadoop 或将在未来两年使用。这些数据不同人看来有不同的观点,你可以认为 Hadoop 将进一步扩大,也可以认为大多数人根本不重视 Hadoop。同时,调查还揭露了一些其他无法平息的影响因素。在没有投资的人当中,49% 的人仍在努力挖掘 Hadoop 的使用价值,而另外 57% 的人指出,其中的技能差距是决定是否使用的主要阻碍,而这并不能立马得到解决。这一现象恰好与 ”Hadoop Testing“ 关于就业趋势的调查结果相一致:在 2014 年中旬,这一关键词在大约 0.061% 的广告中出现,在 2016 年末又增长至 0.087%,在 18 个月内,增长了约 43%。

这可能表明,采用Hadoop的公司数量不一定会降低到坊间证据表明的那样,但公司只是发现很难从他们现有的团队中提取Hadoop的价值,他们需要更多的专业知识。

另一个可能引起人们关注的因素是,一个人的大数据却是另一个人的小数据。 Hadoop是为大量数据而设计的,Kashif Saiyed在KD Nuggets上写道:‘如果你的企业没有真正面临海量数据的问题,你就不需要Hadoop,因此数百家企业对他们无用的、处理2到10TB数据规模大小的 Hadoop集群感到非常失望 – Hadoop技术只是不擅长处理这种规模。‘

大多数公司目前没有足够的数据来保证Hadoop的部署,但还是这么做的原因是他们觉得他们需要互相攀比。 经过几年的实验,并与真正的数据科学家一起工作,他们很快就意识到他们的数据在其他技术上工作得更好。

这种趋势已经超出了采用开源平台的速度,但对于一些公司来说,这已经产生了实际的财务影响。 Cloudera和Hortonworks是从Hadoop框架构建自己产品的两家最大的公司。 由于Hadoop的下滑,对于两家公司都造成了不同程度的重大损失,据报告Cloudera失去了40%,而Hortonworks的股价自2015年中期以来已经下跌了68%。

这篇文章对Hadoop的批评似乎有些苛刻,但并不是平台本身造成了当前的问题。 相反,这可能是由于过分炒作和大数据协会导致了事实上的伤害。一些公司采用了该平台却没有理解它,同时又没有合适的人或数据来使其正常工作,这导致了项目实施的幻灭和明显的停滞。Hadoop依然还有强大的生命力,只是人们需要更好地理解它。

原文:

Three years ago, looking beyond Hadoop was insanity, and there was little else that could come close according to many in the media. However, the reality has been a little different.

For a long period, Hadoop and big data were almost interchangeable when they were being discussed by those in the media, although this was not necessarily found to be the case amongst data scientists. A study by Silicon Angle in 2012 analyzing Twitter conversations between data professionals talking about big data found that they actually talked about NoSQL technologies like MongoDB as much, or more, than Hadoop, which would indicate that it has not actually been the must have that many assumed it was.

Most would argue that Hadoop has been one of the single most important elements in the spread of big data, that it is very much the foundation on which data today is built. We are also still finding new ways to use it, in warehousing for instance. That being said, to the surprise of many, its adoption appears to have more or less stagnated, leading even James Kobielus, Big Data Evangelist at IBM Software, to claim that ‘Hadoop declined more rapidly in 2016 from the big-data landscape than I expected.’

The reasons for this are hard to ascertain, but could be down to a problem common in data circles. A 2015 study from Gartner found that 54% of companies had no plans to invest in Hadoop, while 44% of those asked had adopted Hadoop already or planned to at some point in the next two years. This could, depending on your point of view, be taken to mean either that it would see even further expansion or that the majority were ignoring it. However, the survey also revealed a number of other telling factors with implications unlikely to have subsided since. Of those who were not investing, 49% were still trying to figure out how to use it for value, while 57% said that the skills gap was the major reason, a number that is not going to be corrected overnight. This coincides with findings from Indeed who tracked job trends with ‘Hadoop Testing’ in the title, with the term featured in a peak of 0.061% of ads in mid 2014, which then jumped to 0.087% in late 2016, an increase of around 43% in 18 months.

What this may signal is that adoption hasn’t necessarily dropped to the extent that anecdotal evidence would suggest, but companies are simply finding it difficult to extract value from Hadoop from their current teams and they require greater expertise.

Another element that may be cause for concern is simply that one man’s big data is another man’s small data. Hadoop is designed for huge amounts of data, and as Kashif Saiyed wrote on KD Nuggets ‘You don’t need Hadoop if you don’t really have a problem of huge data volumes in your enterprise, so hundreds of enterprises were hugely disappointed by their useless 2 to 10TB Hadoop clusters – Hadoop technology just doesn’t shine at this scale.’

Most companies do not currently have enough data to warrant a Hadoop rollout, but did so anyway because they felt they needed to keep up with the Joneses. After a few years of experimentation and working alongside genuine data scientists, they soon realize that their data works better in other technologies.

This trend has had impacts beyond a slow down in the adoption of an open source platform though, for some companies this has had real world financial impacts. Cloudera and Hortonworks are two of the biggest companies that build their products out from a Hadoop framework. Both have lost significant value in-part due to its decline, with Cloudera reported to have lost 40% whilst Hortonworks’ shares have plummeted 68% since mid 2015.

Criticism within this article may seem harsh on Hadoop, but it is not the platform in itself that has caused the current issues. Instead it is perhaps the hype and association of big data that has done the real damage. Companies have adopted the platform without understanding it and then failed to get the right people or data to make it work properly, which has led to disillusionment and its apparent stagnation. There is still a huge amount of life in Hadoop, but people just need to understand it better.