Skip to main content

Posts

Showing posts from 2017

Optimizing Spark Job Performance With Apache Ignite (Part 1)

Portions of this article were taken from the book  The Apache Ignite book . If it got you interested, check out the rest of the book for more helpful information. Spark Data frame is included in new book. Apache Ignite offers several ways to improve a Spark job's performance: Ignite RDD, which represents an Ignite cache as a Spark RDD abstraction, and Ignite IGFS, an in-memory file system that can be transparently plugged into Spark deployments. Ignite RDD allows easily sharing states in-memory between different Spark jobs or applications. With Ignite in-memory shares RDDs, any Spark job can put some data into an Ignite cache that other Spark jobs can access later. Ignite RDD is implemented as a view over the Ignite distributed cache, which can be deployed either within the Spark job execution process or on a Spark worker. Before we move on to more advanced topics, let's have a look at the history of Spark and what kinds of problems can be solved by Ignite RDDs.

Book review: High Performance in-memory computing with Apache Ignite by Sadruddin Md

A new title  The Apache Ignite book  has been released including Ignite 2.6 and above. Read the full book review by Sadruddin Md .

The Apache Ignite Native persistence, a brief overview

Portions of this article were taken from the book  The Apache Ignite book . If it got you interested, check out the rest of the book for more helpful information. In-memory approaches can achieve blazing speed by putting the working set of the data into the system memory. When all data is kept in memory, the need to deal with issues arising from the use of traditional spinning disks disappears. This means, for instance, there is no need to maintain additional cache copies of data and manage synchronization between them. But there is also a downside to this approach because the data is in memory only, it will not survive if the whole cluster gets terminated. Therefore, this types of data stores are not considered persistence at all. In this blog post, I will do an effort to explore the Apache Ignite new native persistence feature and provide a clear, understandable picture how the Apache Ignite native persistence works.  In most cases, you can’t (should not) store the whole data

Apache Ignite with Spring Data

See more details on The Apache Ignite Book . Spring Data  provides a unified and easy way to access the different kinds of persistence store, both relational database systems, and NoSQL data stores. It is on top of JPA, adding another layer of abstraction and defining a standard-based design to support persistence Layer in a Spring context. Apache Ignite  IgniteRepository  implements Spring Data CrudRepository interface and extends basic capabilities of the  CrudRepository , which in turns supports: Basic CRUD operations on a repository for a specific type. Access to the Apache Ignite SQL grid via Spring Data API. With Spring Data's repositories, you only need to write an interface with finder methods to query the objects. All the CRUD method for manipulating the objects will be delivered automatically. As an example: @RepositoryConfig(cacheName = "DogCache") public interface DogRepository extends IgniteRepository<Dog, Long> { List<Dog>

In-Memory MapReduce and Your Hadoop Ecosystem (Part 2)

Portions of this article were taken from the book  High-Performance In-Memory Computing With Apache Ignite . If it got you interested, check out the rest of the book for more helpful information. Before reading, be sure to check out  Part 1 ! Apache Ignite provides a vanilla distributed in-memory file system called Ignite File System (IGFS) with similar functionality to Hadoop HDFS. This is one of the unique features of Apache Ignite that helps accelerate Big Data computing. IGFS implements the Hadoop file system API and is designed to support Hadoop v1 and Yarn Hadoop v2. Ignite IGFS can transparently plug into Hadoop or Spark deployment. One of the greatest benefits of the IGFS is that it does away with Hadoop NamedNode in the Hadoop deployment; it seamlessly utilizes Ignite’s in-memory database under the hood to provide completely automatic scaling and failover without any additional shared storage. IGFS uses memory instead of disk to produce a distributed, fault-tolerant, and