Hacker News new | past | comments | ask | show | jobs | submit login
Arrow: Better dates and times for Python (crsmithdev.com)
315 points by amarsahinovic on Aug 5, 2014 | hide | past | web | favorite | 72 comments



Seems like it's written by someone who prefers Ruby or JavaScript -- where there exists already a culture using names which are cute first, even if they are opaque -- over Python. These method naming choices are baffling

Arrow.to() converts to a new timezone? And .replace() applies a relative offset!? Replace the hour field of this object with -1 should not return an object having hour=11. Arrow.get() is doing some kind of quadruple duty, neither of which would I describe as "getting."

And what about that class name? Arrow as the name of a package is fine, but what do you expect someone to make of <Arrow [...]> -- what's wrong with arrow.DateTime?

Great work on making and releasing something, but this API is surprising -- as in, one would be unable to predict how it works. I will continue using python-dateutil


Thank you for putting into words my same initial concerns about this library.

Naming should be much clearer than this, with the intention to make code sort-of read like English text as much as possible. Arrow is a fine name[0], but the actual class/object names should be descriptive.

Also, not only the fact that .replace() can apply relative time-offsets, the quirky way it signifies it, using singular and plural keyword-parameters, it's a "clever" solution but it's not at all intuitive. When I figured it out, it was more of a relief that at least .replace() does not always apply relative time-offsets (which would have been terrible naming).

I wish it was more pythonic, on the whole, agreeing with the way people expect things to behave in Python modules. It says it's inspired by Requests, which gets that very much right, so I was getting my hopes up a little.

Arrow.get() is totally named the wrong thing. It does more like a .create(), maybe I would've put that functionality into the constructor. A method called .get() should implement something very similar to what most .get* methods do on standard types and in the standard library. Which is, extract part of an object that is a container-type (optionally with defaults, etc). Of course that operation does not make sense on an Arrow object (and please don't try), so it probably just shouldn't have a .get() method.

[0] actually it's not. I've found myself accidentally muscle-memory-typing "Array" a couple of times already. I'd probably have renamed this module early on in the process for that reason alone.


I think Arrow still leaves too much confusion around how to handle timezones. That's why I wrote DMC(https://github.com/rhettg/dmc).

Not to be confused with Delorean (https://github.com/myusuf3/delorean) which is a lot like Arrow.


I did not know about DMC and am very happy to see this as I've often desired a simple but proper API for time. I always wondered why so many date+time libraries complicate matters so much.

I strongly agree with the principles you chose - specificaly - all time storage and math should be done in UTC (the 'unicode' of time). Timezones only come in when you display or parse times (the 'encodings' of time).

I hope more and more libraries apply this idea. It makes the API a lot simpler and error retardant.

Good job!


    I strongly agree with the principles you chose - specificaly - all time storage and math should be done in UTC
I've had this argument many times before. Throwing away TZ is in fact throwing away important contextual information.

Having worked on a few data cleaning projects to re-purpose old scientific datasets, I can tell you that the quality of these data would have suffered markedly, and many corrections would not have been possible if the TZ/offsets had been discarded, Eg: "The geolocation for these records cross a TZ boundary here but the UTC offset doesn't reflect that until here so that explains the 1hr gap in the record where the user realized her mistake half-way into the new TZ so we should add an hour to these 860 records"

Yes, it would be better if raw data was recorded in UTC in the first place but when working with such data the ability to make decisions and inferences from the TZ is very useful.


> Yes, it would be better if raw data was recorded in UTC in the first place but when working with such data the ability to make decisions and inferences from the TZ is very useful.

I maintain that time and location are orthogonal. If the application needs both, store both - properly in separate fields. That's better than shoving the granular location in the form of a timezone in the time field.


Yes, from a data modeler's perspective it is definitely orthogonal.

From an evidence-based perspective though, a full "chain of custody" has to be maintained (as far as alterations go), providence of information can be very important.

If you care about reproducibility, how raw data has been manipulated from the original, you'd better have some record of it.

For date-times almost all public datasets standardize on ISO8601. Any datetimes captured by persons or processes which aren't working in UTC from the very beginning will have the TZ offset in it.

Normalizing to UTC is up to the data consumer to do as they please, but some get very upset if you try to say that an event was recorded as happening at 03:00 if the person or process actually recorded 13:00+10:00. It may seem subtle and boring and pointless, but the fact is that you may be seen to be changing the facts.


No.

It is in fact best to store the local time and the location (i.e., not "+01:00", but for example "Europe/Amsterdam"). Not UTC with or without time zone. For example, if you have an appointment in the future, and the time zone or daylight savings time rules change, this is the only way that the appointment can still be at the correct moment. And time zone rules do change.

And this way you can still represent absolute UTC time if you wish. With another representation, you cannot represent human-society time, which is often what you actually want.


Good point. However this is a specific use case requiring future local time, and as you mention, even TZ aware local time fails here. In this case you cannot come up with a guaranteed physical time in the future that the event will take place.

IOW, I think storing local time and location as a rule is not a good idea. The rule should be to use UTC and store location if needed. Only if that doesn't work (e.g. '5 minutes after the game is over') come up with some other representation.


One extra thing (one could call it a weakness) with the local+location scheme is that you will need to have a way to distinguish the "repeated" hour after daylight savings time sets the clock backward one hour from the hour before it.


The issue here seems to be two different definitions of time. There's "physical" time, i.e. a continuous line and there's human/calendar time which is different and is most definitely tied to a location. For an example, see http://en.wikipedia.org/wiki/February_30#Swedish_calendar . (No wonder that time zones complicate things enormously.)


Amen to this.

We've got a complete clusterfuck in our legacy code because somebody decided that UTC was not the right answer, nor was timezones quite, and so somehow made the heinous mistake of creating a "local UTC" and separately storing the TZ. The mind boggles.

I'm still straightening that mess out.

People, computers care not for your "calendar time"--usually, you want physical time and can fix the rest at the view level.


Doing all storage and computation in UTC does not work if you are dealing with events in the future that need a fixed local time even if time zone offsets change.

http://fanf.livejournal.com/104586.html


As a collector of edge cases and other 'valuable weirdness', I'm at a loss to picture such a use case short of some obscure archival process with interoperability constraints... can you elaborate to clarify when this may be a problem?

I was fairly sure ISO8601 with correct timezone information for every item, was completely sufficient for all my time needs, so if theres something it fails at, id like to know more.


Indeed; since timezones have the same abbreviations for multiple different timezones, it is difficult to collect a set of abbreviations that are consistent across a data set.

I wrote cpppo (https://github.com/pjkundert/cpppo.git) to work with industrial data communications in various timezones. The cpppo.history module assures fast and consistent handling (and serialization/deserialization) of time series data across a selected set of consistent timezone abbreviations.

If speed and consistency is an issue it may be of interest.


I think dmc leaves too much to be implemented. Why dismiss a full project on PyPi for a half-written, untested one?


You're right. DMC is really just an experiment. I don't mean to disparage Arrow, it does improve the interfaces around datetime, I was just disappointed it didn't attempt to prevent these common timezone related bugs.

I bring up dmc for discussion, because I started it right after finding arrow. Building a full wrapper like this hadn't occurred to me until I saw arrow do it.


Cool!

Have you looked at the moment.js manual ? They have a lot of really useful examples (I believe Arrow was inspired by Moment).


Suggestion: rename arrow.now() to arrow.localnow() to make it even clearer that it does not generate utc. I've run into this mistake many times with datetime.now() vs datetime.utcnow()


This library is a couple years old already (version 0.1.6 hit PyPi in Nov 2012). A rename at this point seems unlikely.

Unlike datetime.now() and datetime.utcnow(), arrow.now() produces a datetime that includes tzinfo. As long as that's on there and correct, I think the risk you describe is low.


    """calpaterson's awesome project"""
    import arrow
    arrow.localnow = arrow.now
    del arrow.now


I'm the other way around, consistency with the standard library (even if that decision was a mistake) will increase learn-ability.


From the webpage:

> Python’s standard library and some other low-level modules have near-complete date, time and time zone functionality but don’t work very well from a usability perspective:

Why be consistent with something that suffers from usability problems?


It's a shame that Python's time/date stuff is so wonky. I see stuff like dateutils and arrow and just wish I didn't have to go elsewhere for such core functionality.

Not that it's a big deal to loop in external dependencies or anything, I just see newer (and experienced) Python devs trip up on stuff like TZ aware/naive times. Also, dateutil's relativedelta is so nice compared to the built-in timedelta!


> It's a shame that Python's time/date stuff is so wonky. I see stuff like dateutils and arrow and just wish I didn't have to go elsewhere for such core functionality.

That's on point, and you're not alone. It's a little frustrating when you have to take a detour to get something done. There's that extra hurdle. That said, user friendliness, developer friendliness, and maintainability are all about using the best tools for the job. If arrow is that tool, then so be it.


> user friendliness, developer friendliness, and maintainability are all about using the best tools for the job

Definitely. The most unfortunate part of this is that people who are newer to the ecosystem won't really know where to go or what they're even looking for, necessarily.


author here: the other commenters here are correct, I used .now() and .utcnow() in order to match the stdlib API.


Arrow is ambiguous when dealing with timezones, a lot of the high functionality ideas were copied from Delorean. Delorean has many more sensible default and is clear about educating people about naive vs. localized datetimes and a lot more sane.

Documentation is very educational too well worth the read. http://delorean.readthedocs.org/en/latest/


It would be good if the manual had all the examples that moment.js has (which arrow was inspired by).


Is there any language that got the date & time library right the first time? It seems like people always suggest to use a third party library. JavaScript has Moment.js, Java has Joda-Time, and now Python has Arrow.


Go's time package is nice (http://golang.org/pkg/time/), although some people complain about parsing date/time strings using a reference string ("Mon Jan 2 15:04:05 -0700 MST 2006") vs. PHP or Python style format strings.


That's a really cool idea. I'm always commenting my code with reference strings because format strings are so unreadable.


The Dart core library team was very conservative regarding dates and time for this reason.

They kept things down to just DateTime (timezone aware, but only supports local and UTC) and Duration, and assumed that anything more advanced should be outside the core.

I'm not sure if they got it right or not, but we haven't seen anyone want to replace the built in types yet. I think we've always expected a Joda-Time like library eventually, but hopefully DateTime and Duration are good, so it's additive.

(I'm on the Dart team, but not on the core lib team)


Rebol comes built-in with nice date & time features.

While it doesn't have all the bells and whistles you would get with a 3rd party library like the one posted here it does have an advantage that date & time are both first class datatypes:

  >> type? 6-Aug-2014
  == date!

  >> type? 01:20:59   ;  1 hour, 20 mins and 59 secs
  == time!
 
  >> 6/8/2014 - 5/8/2014
  == 1
 
  >> difference 6/8/2014 5/8/2014
  == 24:00

  >> yesterday: now - 1
  == 5-Aug-2014/13:12:33+1:00

  >> yesterday/day
  == 5
 
  >> one-hour-twenty-mins: 1:20
  == 1:20
 
  >> one-hour-twenty-mins/minute
  == 20
 
  >> one-hour-twenty-mins/hour 
  == 1

  >> two-hours: one-hour-twenty-mins + 0:40 
  == 2:00
refs: http://www.rebol.com/r3/docs/datatypes/date.html | http://www.rebol.com/r3/docs/datatypes/time.html


I don't know if it was in there in the early versions of the .NET framework, but C# and .NET have decent time libraries standard. I didn't run into nearly as many warts and annoyances with .NET dates & times as I did with Python.


I agree - I came to Python after doing .Net and was surprised at how confusing the time/date handling is - particularly as pretty much everything else in Python is so nice!


Good point. Date & Time is one of those subjects that seems relatively easy to the uninformed, but actually requires some pretty careful consideration and domain-specific knowledge to get correct.


Date/Time APIs are a core part of Fantom

http://fantom.org/doc/docLang/DateTime.html


Java pretty much absorbed Joda-Time in version 8.


Date handling in PostgreSQL has always seemed ideal compared with what was available in Python. I'm happy to read about arrow.


And I've been even happier reading about Delorean. http://delorean.readthedocs.org/en/latest/quickstart.html


Y'all may wanna check out python dateutil. https://labix.org/python-dateutil


Nice. One thought: why do you call it arrow? I know you can call it anything you want, but one reason requests is so popular because the name totally makes sense. My little 0.2 is why not rename it to pydate or something more intuitive.

What's the difference between .utcnow() and .now()? Is one supposed to be an alias or is .now() default to the time the platform is set to?

I've opened an issue regarding calling convention. What do HNer think?

https://github.com/crsmithdev/arrow/issues/125


I presume it's a reference to https://en.wikipedia.org/wiki/Arrow_of_time.


The other commenters here are correct, it is a reference to Time's Arrow (the book, or Star Trek: TNG episode), and other directional time metaphors :)


I imagine it's because of "time flies like an arrow".


"fruit flies like a bannana"


utcnow() is the current time in the UTC timezone with not timezone attached, now() is based on the timezone your machine is set to.


The library looks really nice usability wise.

The arrow.get function tries a bit too hard to be user-friendly ; it lets me wondering if a value could have an unexpected interpretation.

One use-case that seem to be missing is time deltas. There is a support for time iteration but I don't see a good way to transform two timestamps into a time interval.


> One use-case that seem to be missing is time deltas.

I just fired up ipython and tried:

    >>> t1 = arrow.now()
    >>> t2 = arrow.now()
    >>> t2-t1
    datetime.timedelta(0, 4, 864716)
It appears the difference operator returns a datetime.timedelta object.


This is a very interesting problem to tackle for all runtimes.

In my experience, the main issues are: [1] How do we store the date / time fields in the database tier [2] How do we display those fields to users in multiple time zones and cultures and gather validated input from those cultures [3] How do we do date / time math in a simple manner

Issue [1] has implications for manipulating data directly (i.e. querying or reporting). Should the data be stored in UST or in some "local timeframe"?

Issue [2] always seems to trip some developers: how do we render a time or date in a way that makes sense in a global world (i.e. "8/5/2014" vs. "5/8/2014" vs. "5-8-2014" or 3:26 PM vs 3:26 etc).

Issue [3] is always a pain too, based on whatever decisions were made for [1] and [2].

Test of awesomeness: if your code works in Hebrew or Thai cultures!


It would be cool if python could support date/timestamp literals.

I've implemented something similar in a query language I wrote, where dates are represented similar to ISO 8601 format:

     d'2014-12-01'
     t'2014-12-01 12:52'
     t'2014-12-01 12:52 PST'
     t'2014-12-01T12:52:20.0820Z'
     t'2014-12-01T12:52:20.0820+08.00'


It's 2 inconvenient characters longer for each one, but aliasing your date and time parsers to "d" and "t" and then adding parentheses to what you have written there seems like a good enough solution for the majority of cases.

Do you have so many time literals in your code?


Different literal types are only necessary when the parser needs to treat the string differently - that's why you need them for raw strings for instance.

When you don't need to do that, a function call is a much better answer. There are many things that could be string literals but aren't because they don't have to be.


Of you just go with pure ISO8601 then you won't need to differentiate between dates, times, durations and intervals; they can each be specified unambiguously. Up to you, of course; it might not make sense in your use-case.


That would be part of the plan.

My use case is the iPython astrophysicist use case, where users tend to be very interactive with their dates and times. In fact, that's why I made the query language, because it was necessary for dataset discovery (i.e. return datasets between this time and that time). Of course, we often use a variety of types of times, from Julian day to Mission Event Time (where an arbitrary epoch is chosen that's usually close to some significant event, usually full funding, mission launch, first light, etc..)

Of course we could always just use function calls, but the ability to have literals is really nice.

Maybe I'm just a dreamer, but a similar extension to javascript and JSON would be really nice as well, given that you need a schema to understand the proper parsing of dates, otherwise you end up with a string, number, additional type field, or a fallback to Hungarian notation.


Looks promising! Although author said that moment.js was an inspiration, I'm glad that he didn't implement .replace functionality as mutation like moment does. It was a constant source of errors.


Fun fact, in the oldest, original version, arrow objects were mutable, and this was ripped out in 0.2 for exactly that reason.


Ah, this is lovely. If there's one routine thing I trip up over again and again, it's times, every time. Big fan, nice work. Will definitely look at using this next project.


It's so easy to overlook something and not realize it is causing issues for users for a very long time.


Looks nice. I noticed the ceil function has an offset of one microsecond, this way it is not consistent with e.g. ceil in numpy. What was the reason for this?


Very nice! This replaces a lot of code I'm manually implementing now, to do things like humanize() and floor()/ceil()


> This replaces a lot of code I'm manually implementing now, to do things like humanize()

Why would you do it by hand anyway when Babel already does it? http://babel.pocoo.org/docs/api/dates/#babel.dates.format_ti...


I've used this in the past. Good, well-written library. The stdlib datetime/time modules are such a PITA sometimes.


There are too many ways to do something, best implement the One True Way.

There's an obligatory xkcd somewhere on this...



Funny and true, but this is a separate case. We're not talking about implementing a new standard, we're talking about easing development on your next project.


This might be what I am looking for since I looked at the datetime docs the first time years ago.


re. parsing and formatting tokens: because what we needed was yet another date/time pattern format which looks like LDML's but behaves completely differently.


Very nice, thank you, definitely using this in new code.


It would be better to stop reinventing this wheel.


We can, but only after someone actually produces a round wheel, with no gaps, sufficient spokes and a hole for an axle. I've yet to see such a beast.




Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: