Normally you would start with a Parts of Speech algorithm but with the limited context of flights (international only?) you could focus only on a "to" clause and a "from" clause with the city/country found in the RHS of each clause.
Resolving ambiguous cities could start with a simple ranking algorithm based on a list of International/Domestic destinations and their corresponding traffic level.
NLU is hard but constrained domains like this are tractable, in my overly optimistic opinion.
Trying to do it with grammar is exactly why you get into the rabbit hole where you get mixed up about Old York and New York. There is that "New York City" that daddy doesn't live in no more, and the "New York State" I live in where you frequently find dirt roads that look passable on GPS and paper maps but aren't.
It is an ontological problem, not a grammatical problem.
You are right that "air traffic level" is key in curating your city and/or airport database. You would start with a list like
and work down from the top. At some point you can put together a "proof" that it works for the top 20% of the airports that get 80% of the traffic and work up from there to cover 95% of the traffic.
For any project like this you need to build a test set if not a training or evaluation set. If you are not willing to look at 1000 examples by hand then don't get started.
People are willing to spend six months fighting with everything from procurement to device drivers to run the same models on the same digits that everybody else runs, but it is hard to get people to make evaluation sets.
I can't guarantee that the people who make those sets will succeed, but it is certain that those who don't make them will fail.
B: Each has a finite number of ways of spellings.
A x B is like, a large number. It would take a lot of type writers and there might be new ways of spellings, new desitnations and new places to abandon... ehh I mean leave.
Which is the OP's issue as well: they'd be better off letting somebody else like Google parse their text instead of trying to reinvent the wheel.
That said, if I were to implement this from scratch, I would not start with a positive list of cities. Rather, I would implement a couple of patterns like: "from X to Y", "X to Y", ""to Y from X".
But invariably there are going to be ambiguities. How do you deal with a city called "from to" :-D? How to you parse "from from to to to from"? I'd probably tokenize it using place lists, like
"from (from to) to (to from)"
"from (from to) (to to) from"
"from from (to to) to from"
A side comment, for someone like me who isn't an NLP expert, it seems strange that there are no libraries that let me parse a string accoring to some rules in all possible ways (like a tree) and pick the one that is most probable. I'm sure this exists, but it is hard to search for...
I think you're describing the Viterbi Algorithm.
So what you could do is take a general database of all flights (which is structured data and easy to work with) and then the problem is to label the unstructured input data with the probability it relates to a particular source, destination pair.
For example, there are no flights from York (lovely though York is) to Venice, so that mistake isn't one such a model could even make. Whereas there are a lot of flights to and from New York (including to Venice), and since that number is large it's a much higher probability labeling than some others.
The question smells like one of those "Solve my homework/work assignment for me" questions.
The horde of mechanical turks is really expensive, but with near-infinite VC capital, many labor-intensive white-collar markets can indeed be thoroughly disrupted.
Selling ten-dollar bills for five dollars each is guaranteed to gain traction, and you just need to convince investors that you're just a few quarters away from being able to print your own.
Spacy have a handy demo of NER (although it appears down for me right now):
In the example, the poster shows inferring full location, not just the name in the example output.
Some examples where that gets tricky:
If I want driving instructions "from London to Hamilton" odds are I want London, Ontario to Hamilton, Ontario.
But what about "from London to York"? Without knowing more, I can't tell if that is in England or Ontario, and "knowing more" here involves a lot of guesses about in what context someone would qualify a name based on where they live relative to different locations with those names.
If I want "London to Paris" I could mean London, Ontario to Paris, Arkansas, but odds are very much I don't, because it seems unlikely someone far from either place would leave it unqualified. If that someone is close to either Ontario or Arkansas, it's tempting to think they'd still refer to the bigger place... But I see enough Candians on e.g. London Craigslist assuming it's for London, Ontario that this seems to depend on additional context.
But here be dragons. Qualified names are often still tricky to interpret unambiguously. Let's say I've against all odds been to Paris, Wisconsin once, and I want to go back there. Chances are I might write "to Paris, Wisconsin" or similar... But which one? Paris, Grant County, or Paris, Kenosha County"? Or I might even write "to Paris, USA" if I wasn't painfully aware there's at least a dozen Paris-es in the US.
And what about when I mess them up? When I started writing the paragraph above, I first mis-remembered and wrote Paris, Wyoming.. Joy...
From experience, you can get decent results on average with a dataset with size and population and assume that the bigger or more populated a place is, the larger is the region where it is likely to be the best choice, however. But there will be many that remains ambiguous.
But when you look at names that are very common and where many alternatives exist that are of a wide range of sizes, spread all over the place, that too becomes difficult. Examples: "from Santa Cruz to San Jose". California, right, because you have both names in the same state? Except that also applies to Arizona. Or New Mexico. To know whether people there would be likely to talk about them without specifying more detail, you need additional knowledge for each of these locations, and e.g. try to guess when some of the locations are too small for even relative "locals" to ignore them. And those are just the examples in the US that have both names in the same state, but there are dozens of both names across the Americas and further afield.
(As you might tell, I've spent too much time having to deal with location data...)
And sitting here in Toronto, with an available-to-google frequent travel history to London, Ontario, searching "london ON to paris ON" is not enough for google to know what I mean.
This stuff is hard.
I was literally just thinking earlier that at least if someone finds something to add/correct, it will be awesome because it will just emphasise just how hard this problem is even for humans.
(I'm in Atlanta; it's probably a safe guess that if I were making this query legitimately I'm asking for the Alabama one.)
Similarly for Birmingham to Oxford; there are reasonably prominent towns called Oxford in Alabama, Missisippi, and maybe Georgia, but it still insists on attempting to find a transatlantic driving route and failing.
With enough context I could leave off any additional info when talking with a human, but there would likely never be "enough context" when asking a computer.
Given that those strings are going to be on the short side (you don't need to anticipate someone writing an epic saga in dactylic verse to describe a flight) the whole set of such sentences is probably finite and relatively small and can be represented by an equally finite and much smaller set of rules. So while alvas' comment is spot on regarding the amount of work one needs to do to cover all the exceptions and edge cases that might crop up even in such a small subset of natural English, it is not a problem so big as to be impossible to solve with some elbow grease.
There is much precedent of solving much more complex problems with hand-crafted grammar rules. For a very recent example, the latest online version of Magic the Gathering, M:tG Arena, apparently uses a hand-crafted context-free grammar to represent the language used on Magic cards . The grammar is used in a parser that executes cards as scripts in a scripting language. Creating a grammar for the entire M:tG language is a job that a dedicated programming team can accomplish in a reasonable amount of time but even representing a small but substantial subset of that language is a goal that is in the range of, er, let's say ah a degree student of Computer Science in her final year :)
 Just let me find that link. I got it around here somewhere...
Yep, here we go:
Magic Arena's GRE: The Groundwork for future digital products, in Chris Clay's Interview
To dispel any misconceptions: the GRP is not a machine learning approach to Natural Language Processing. It's more of a compiler/parser that uses rules and logic we've crafted. We have a dictionary that defines the parts of speech and some semantic data for "Magic Words". We also have a huge list of syntax rules for "Magic Sentences", similar to a Context Free Grammar. With this, and our good friend the Natural Language Toolkit, we generate a parse tree for ability text. The tree groups individual words into phrases, multiple phrases into sentences, and all the sentences under the ability as a whole. We then process that representation of an ability into higher-level ideas, specific to Magic. “Draw a card” is no longer just a verb and an object, but is an action and entity with regards to game play. We take verb 'atoms' (destroy, discard, exile) and use them to construct molecules ("triggered ability", "sequence of effects", "replacement effect"). With that semantic representation of the ability, we can then basically compile it into code. That code is then added to the GRE,and executes when you play Arena!