
GDELT 2.0: new release of open Global Event Database updated every 15 min - bradleysmith
http://blog.gdeltproject.org/gdelt-2-0-our-global-world-in-realtime/
======
bradleysmith
new features:

15 Minute Updates. Access the world’s breaking events and reaction in near-
realtime as both the GDELT Event and Global Knowledge Graph now update every
15 minutes.

Realtime Translation of 65 Languages. GDELT 2.0 brings with it the public
debut of GDELT Translingual, representing what we believe is the largest
realtime streaming news machine translation deployment in the world: all
global news that GDELT monitors in 65 languages, representing 98.4% of its
daily non-English monitoring volume, is translated in realtime into English
for processing through the entire GDELT Event and GKG/GCAM pipelines. GDELT
Translingual is designed to allow GDELT to monitor the entire planet at full
volume, creating the very first glimpses of a world without language barriers.
A special emphasis on locations and names makes GDELT 2.0 likely the largest
multilingual geocoding system in the world.

Realtime Measurement of 2,300 Emotions and Themes. GDELT 2.0 also brings with
it the debut of GDELT Global Content Analysis Measures (GCAM), representing
what we believe is the largest deployment of sentiment analysis in the world:
bringing together 24 emotional measurement packages that together assess more
than 2,300 emotions and themes from every article in realtime, multilingual
dimensions natively assessing the emotions of 15 languages (Arabic, Basque,
Catalan, Chinese, French, Galician, German, Hindi, Indonesian, Korean, Pashto,
Portuguese, Russian, Spanish, and Urdu). GCAM is designed to enable
unparalleled assessment of the emotional undercurrents and reaction at a
planetary scale by bringing together an incredible array of dimensions, from
LIWC’s “Anxiety” to Lexicoder’s “Positivity” to WordNet Affect’s “Smugness” to
RID’s “Passivity”.

High Resolution View of the Non-Western World. Over the last few months we’ve
embarked upon an ambitious initiative to vastly expand GDELT’s knowledge of
the media systems of the non-Western world. Working closely with governments,
think tanks, academics, NGO’s, and citizens on the ground throughout the world
we have been working country-by-country to try to build the highest resolution
inventory possible of the media systems of the non-Western world. While we
still have a long way to go and the fluidity of the world’s media ensures that
this will be a perpetual task, we are incredibly excited by the ability of
this high resolution inventory, coupled with GDELT Translingual’s ability to
translate 98.4% of this material in realtime, to give voice to the most remote
corners of the world in near-realtime.

Relevant Imagery, Videos, and Social Embeds. A large fraction of the world’s
news outlets now specify a hand-selected image for each article to appear when
it is shared via social media that represents the core focus of the article.
GDELT identifies this imagery in a wide array of formats including Open Graph,
Twitter Cards, Google+, IMAGE_SRC, and SailThru formats, among others. In
addition, GDELT also uses a set of highly specialized algorithms to analyze
the article content itself to identify inline imagery of high likely relevance
to the story, along with videos and embedded social media posts (such as
embedded Tweets or YouTube or Vine videos), a list of which is compiled. This
makes it possible to gain a unique ground-level view into emerging situations
anywhere in the world, even in those areas with very little social media
penetration, and to act as a kind of curated list of social posts in those
areas with strong social use. Quotes, Names, and Amounts. The world’s news
contains a wealth of information on food prices, aid promises, numbers of
troops, tanks, and protesters, and nearly any other countable item. GDELT 2.0
now attempts to compile a list of all “amounts” expressed in each article to
offer numeric context to global events. In parallel, a new Names engine
augments the existing Person and Organization names engines by identifying an
array of other kinds of proper names, such as named events (Orange Revolution
/ Umbrella Movement), occurrences like the World Cup, named dates like
Holocaust Remembrance Day, on through named legislation like Iran Nuclear
Weapon Free Act, Affordable Care Act and Rouge National Urban Park Initiative.
Finally, GDELT also identifies attributable quotes from each article, making
it possible to see the evolving language used by political leadership across
the world.

Tracking Event Discussion Progression. Under the previous version of GDELT,
only the first URL mentioning a given event was recorded, even if the event
was mentioned in a hundred separate articles. GDELT 2.0 adds a new “Mentions”
table that records every mention of an event over time, along with the
timestamp the article was published. This allows the progression of an event
through the global media to be tracked, identifying outlets that tend to break
certain kinds of events the earliest or which may break stories later but are
more accurate in their reporting on those events. Combined with the 15 minute
update resolution and GCAM, this also allows the emotional reaction and
resonance of an event to be assessed as it sweeps through the world’s media.

Over 100 New GKG Themes. There are more than 100 new themes in the GDELT
Global Knowledge Graph, ranging from economic indicators like price gouging
and the price of heating oil to infrastructure topics like the construction of
new power generation capacity to social issues like marginalization and
burning in effigy. The list of recognized infectious diseases, ethnic groups,
and terrorism organizations has been considerably expanded, and more than 600
global humanitarian and development aid organizations have been added, along
with global currencies and massive new taxonomies capturing global animals and
plants to aid with tracking species migration and poaching.

Source Geographic Background Knowledge. GDELT now assesses the geography of
every outlet it monitors over time and estimates its physical location on
earth, incorporating that information back into the geocoding process to
maximize its ability to recognize the geography of local media (a small rural
radio station likely assumes its listeners know what country it is based in
and thus does not clarify every mention of a local location with the
corresponding country name).

Global Knowledge Graph Now in BigQuery. The GDELT Global Knowledge Graph is
now available in Google BigQuery, allowing you to query and explore the GKG in
realtime and to integrate it into queries of the Event dataset. In fact, the
Event, Mentions, and GKG tables are now all in BigQuery and updated every 15
minutes, allowing you to leverage BigQuery’s enormous power to perform mass-
scale analytics in near-realtime on our changing planet.

