Hacker News new | past | comments | ask | show | jobs | submit login

> Store raw data if possible. This allows you to condense it later.

I have some daily scripts reading from an http endpoint, and I can't really decide what to do when it returns html instead of json. Should I store the HTML as it is "raw data" or should I just dismiss it? The API in question has a tendency to return 200 with a webpage saying that the API can't be reached (typically because of a time out)




I wouldn't store that usually, I'd use that to trigger retries.

For you storing the raw data is storing the json that http endpoint returns rather than something like

    let content = get(url).json()
    info_i_care_about = content['data']['title']
    store(info_i_care_about)
as otherwise you'll get stuck when the json response moves the title to data.metadata.title or whatever

It's usually less of an issue with structured data, things like html change more often, but keeping that raw data means you can process it in various different ways later.

You also decouple errors so your parsing error doesn't stop your write from happening.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: