Apache Hadoop® is an open source platform providing highly reliable, scalable, distributed processing of large data sets using simple programming models. Hadoop is built on clusters of commodity computers, providing a cost-effective solution for storing and processing massive amounts of structured, semi- and unstructured data with no format requirements. This makes Hadoop ideal for building data lakes to support big data analytics initiatives.
Key Features
Better real-time data-driven decisions
Incorporate emerging data formats (streaming audio, video, social media sentiment and clickstream data) along with semi- and unstructured data not traditionally used in a data warehouse. More comprehensive data provides more accurate analytic decisions in support of new technologies such as artificial intelligence (AI) and the Internet of Things (IoT).
Improved data access and analysis
Hadoop helps drive real-time, self-service access for your data scientist, line of business (LOB) owners and developers. Hadoop is helping to fuel the future of data science, an interdisciplinary field that combines machine learning, statistics, advanced analysis and programming.
Data offload and consolidation
Optimize and streamline costs in your enterprise data warehouse by moving “cold” data not currently in use to a Hadoop-based distribution. Or consolidate data across the organization to increase accessibility, decrease cost and drive more accurate data-driven decisions.