27 Jun 2016, 22:10

Share

The new MongoDB Spark connector has been released!

Last month I announced that the new Spark connector for MongoDB was in beta. After some invaluable testing by the community, I’m excited to announce that the first official release is now available from spark-packages:

> $SPARK_HOME/bin/spark-shell --packages org.mongodb.spark:mongo-spark-connector_2.10:1.0.0

A clean, simple connector.

At MongoDB we’ve been listening to your feedback about what you would like from a new mongodb connector. With that in mind we’ve written a totally new idiomatic connector for spark:

import com.mongodb.spark._
import com.mongodb.spark.sql._

// Loading data is simple:
val rdd = sc.loadFromMongoDB()     // Uses the SparkConf for configuration
println(rdd.count)
println(rdd.first.toJson)

// DataFrames and DataSets made simple:
// Infers the schema (samples the collection)
val df = sqlContext.loadFromMongoDB().toDF()
df.filter(df("age") < 100).show()  // Passes filter to MongoDB

// Schema provided via a Case Class
val dataframeExplicit = sqlContext.loadFromMongoDB().toDF[Character]()
val dataSet = sqlContext.loadFromMongoDB().toDS[Character]()

// Writing data to MongoDB is also easy:
val centenarians = sqlContext.sql("SELECT name, age FROM characters WHERE age >= 100")
centenarians.write.option("collection", "hundredClub").mongo()

More examples and full documentation can be found on the documentation site.

Feedback wanted

We would love to have your feedback on the new driver, so please feel free to post to the MongoDB User mailing list or add feature requests to the Jira project.

Enjoy!

comments powered by Disqus