Sooner or later every company that strives to be data-driven stumbles across a problem of having more data than can be analyzed without a specialized position devoted to this task. Here is how we tackled this challenge at Base.
Tagged in ‘data science’
DataFrames are a great abstraction for working with structured and semi-structured data. Recently they were introduced in Spark and made large scale data science much easier. How do those new, shiny, distributed Spark DataFrames compare to Pandas, established single-machine tool for data analysis? Let’s find out!