Hacker News new | past | comments | ask | show | jobs | submit login
Announcing flyio, an R package to interact with data in the cloud (socialcops.com)
45 points by akashtndn on Dec 16, 2018 | hide | past | favorite | 6 comments

The R ecosystem around cloud services and their APIs still seems immature, so it’s great to see folks working on packages in the space.

I’m not 100% sure if this is providing any new functionality not provided by existing cloudyr projects or is just wrapping them in a new API. I think either is fine, but it would help to better understand why you’d want to use flyio vs, say aws.s3 or the like.

Also, there are some aspects of the API that make me a little itchy. If I’m reading the examples correctly, it seems like flyio_set_datasource sets a global variable and then there are generic functions like list_files that do different things based on that global state?

That seems risky to me, and a more idiomatic approach to this would be to have a function that returns a handle object representing a Google Cloud or AWS service, then have generic functions take that handle and dispatch to appropriate methods.

Even then, namespacing in R isn’t really a thing, and I worry that really plain function names like list_files or export_file are likely to get clobbered by other packages using names like that. For packages like readr that are intended to actually replace large swaths of IO functions, that’s fine. But I’m not sure it makes sense for a more specialized package like this.

Despite that, I do appreciate you all creating and open sourcing this. Like I mentioned, any work on cloud packages is welcome from my perspective! Interested to see how this develops.

Isn't the main USP of flyio the cloud agnostic part? You can play between local, google and amazon without changing the code.

A dplyr abstraction for Elastic Search I tried to use this year turned out to be orphanware. I'm back scraping HTTP fetches directly with a canned JSON blob posted.

If this one works, I'd love some positive feedback signals from users, because I don't want to build hopes, or code, to an interface spec which turns out to wither and die.

The name already seems to be used by a Node.JS module.

There’s a bigrquery package on CRAN by Hadley Wickham that seems to have already covered this use case.

bigrquery is for a completely different use; a wrapper for Google Big Query. This seems to be a cloud agnostic read and write using aws/gcp. cloudyr packages seem to be the closest.

congratulations socialcops team !

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact