apache spark - Are cached RDDs resilient to graceful worker shutdown? -
i have (very) small spark cluster used 'sandpit' environment several people. occasionally, need restart worker nodes in course of maintaining cluster.
if running job working off rdd has been .cache()
'd, , worker stopped gracefully (by running ./stop-slave.sh
on node), happens portion of cached rdd?
the 2 scenarios can think of (assuming storage level rdd memory_only
, no replication) that:
- the worker distributes portion of rdd across other workers;
- the portion of rdd held worker lost, , must recomputed.
the documentation suggests partition recomputed, it's unclear whether covers 'graceful' worker shutdown.
Comments
Post a Comment