Version 1.1.0 of the MongoDB Spark connector has been released. As well as the MongoDB Spark Connector 2.0.0-rc0, bring Spark 2.0 support.
This is the first release after the 1.0.0 driver and contains some API improvements and updates based on feedback from users. Many thanks to all those that have provided feedback either through the MongoDB User mailing list, via StackOverflow or via the Spark Jira project.
It’s been thrilling to get such great feedback and find out about some of the real world scenarios the connector has been used for. One of my favourites so far has been about how China Eastern Airlines and how they use the connector to save time and money. But wether you’re a big or small user of the connector, I’d really appreciate your feedback and comments. It really is central to making this connector even better and more accessible.
Improvements in 1.1.0
- Saving DataFrames with an
_idfield will updated in place, rather than error.
- You can now use SQL to
INSERT INTOa collection.
- Added support for Spark MapTypes in schemas.
- IsNotNull filter improved so that it also checks the field exists
- Added helpers for defining the schemas and querying unsupported MongoDB datatypes.
See the full changelog for detailed information and links to the Jira tickets.
> $SPARK_HOME/bin/spark-shell --packages org.mongodb.spark:mongo-spark-connector_2.10:1.1.0
Spark 2.0 support
The 2.0.0.rc-0 connector is available from maven central and provides support for Spark 2.0, as well as all the improvements from the 1.1.0 generation of the driver.
There were a few minor API changes required to support Spark 2.0:
- DataFrame and Dataset are now unified. In Scala and Java, DataFrame and Dataset have been unified, i.e. DataFrame is just a type alias for Dataset of Row.
- SparkSession The new entry point that replaces the old SQLContext and HiveContext for DataFrame and Dataset APIs.
The actual code changes to interact with MongoDB should be minimal and are designed to be as unobtrusive as possible.