Hacker News new | past | comments | ask | show | jobs | submit login

There is a lot of great Python and Pandas code snippets, but I am not sure anynone posted a Numpy based solution. Below is mine. It gives 100x speed up over the base (quadratic Python) solution. Still, suggested Pure Python solutions and idiomatic Numpy solutions are much faster. I suspect Numpy has more power than that.

def gen_stats_numpy_l(dataset_numpy): start = time.time() unique_products,unique_indices = np.unique(dataset_numpy[:,0],return_index = True) product_stats = [] split = np.split(dataset_numpy,unique_indices)[1:] for item in split: length = len(item) product_stats.append([int(item[0,0]),int(length),int(np.sum(item[:,2])),float(np.round(np.sum(item[:,3])/length,2))]) end = time.time() working_time = end-start return product_stats,working_time




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: