rw-book-cover

Metadata

Highlights

So people using it interactively would need to remember this

After the data has been parsed once, we’d like to save the data in its parsed form on the cluster so that we don’t have to reparse it every time we want to ask a new question. Spark supports this use case by allowing us to signal that a given DataFrame should be cached in memory after it is generated by calling the cache method on the instance. Let’s do that now for the parsed DataFrame: parsed.cache() (View Highlight)