Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This looks awesome!

I'm curious. How are you dealing with the different namespaces and range of information used in the iXBRL/XBRL files? When I tried to parse these a while back, I ended up writing a lot of ugly try/except stuff.



hi yeah that was a real pain for me. There are 2 namespaces in use, http://www.xbrl.org/uk/gaap/core/2009-09-01/ and http://www.xbrl.org/uk/cd/business/2009-09-01, either of which can either be correctly declared in the file, or just used without declaration (I hate xml namespaces!). In the end I used this approach, it's ruby using Nokogiri

  # helper methods

  def content_ar_from(doc, xpath)
    doc.xpath(xpath).map {|d| d.content.strip}
  end

  def first_value_for(doc, xpath)
    content_ar_from(doc, xpath).first
  end


  # download the file to 'path'
  doc = Nokogiri::XML(File.read(path))  

  # try to find the declared namespace
  uk_gaap = doc.namespaces.select {|k, v| v =~ /http:\/\/www.xbrl.org\/uk\/gaap\/core\/2009-09-01/}
  uk_bus = doc.namespaces.select {|k, v| v =~ /http:\/\/www.xbrl.org\/uk\/cd\/business\/2009-09-01/}

  # if it's not there use default
  uk_gaap_ns = uk_gaap.keys.first.gsub(/xmlns:/, "") rescue "uk-gaap"
  uk_bus_ns = uk_bus.keys.first.gsub(/xmlns:/, "") rescue "uk-bus"

  # example of getting the company number
  company_number = first_value_for(doc, "//*[@name='#{uk_bus_ns}:UKCompaniesHouseRegisteredNumber']")

  # example of getting the cash in hand
  cash_bank_in_hand = first_value_for(doc, "//*[@name='#{uk_gaap_ns}:CashBankInHand']")




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: