So knowledge like: Did the passenger have kids on board? Was the passenger nobility? Was the passenger travelling first class? Where was the passenger located on the ship after boarding? And how do these factors influence survivability?And reality like: The actual sinking of the Titanic?If your model concludes that nobility, traveling first class, close to the exits, without family, has a higher chance of surviving, then this is fancy nonsense or a false belief?You make a really strange case for your view.

 Correlations does not imply causation. There were many more relevant but "invisible" variables, which, probably, related to some genetic factors, like ability to sustain exposure to the cold water, ability to calm oneself down to avoid panic and self-control in general, strong survival instinct to literally fight the others, etc. The variables you have described, except the age of a passenger, are visible but irrelevant. And pure luck must have a way bigger weight and it is, obviously, related to the genetic favorable factors, age, health and fitness.
 This challenge is not about causal inference. I do agree it is more of a toy dataset, to get started with the basics, and that there are a lot of other variables that go into survivability. But to say these variables, except for age, are irrelevant is mathematically unsound: You can show with cross-validation and test set performance that your model using these variables generalizes (around 0.80 ROC AUC). You can do statistical/information theoretical tests that show the majority of these variables is a significant signal for predicting the target.In real life it is also very rare to have free pickings of the variables you want. Some variables have to substituted with available ones.The Titanic story is to make things interesting for beginners. One could leave out all the semantics of this challenge, anonymize the variables and the target, and still use this dataset to learn about going from a table with variables to a target. In fact, doing so teaches you to leave your human bias at the door. Domain experts get beaten on Kaggle, because they think they need other variables, or that some variables (and their interactions) can't possibly work.Let the data and evaluation metric do the talking.
 >> Domain experts get beaten on Kaggle, because they think they need other variables, or that some variables (and their interactions) can't possibly work.That sounds a bit iffy. A domain expert should really know what they're talking about, or they're not a domain expert. If the real deal gets beaten on Kaggle it must mean that Kaggle is wrong, not the domain expert.Not that domain experts are infallible, but if it's a systematic occurrence then the problem is with the data used on Kaggle, not with the knowledge of the experts.I mean, the whole point of scientific training and research is to have domain experts who know their shit, know what I mean?