Level III

Titanic Revisited: All Hands on Deck! (EU2018 202)


Level: Beginner

Alfredo López Navarro, Web Intelligence Knowledge Manager, Digital Command Center, Telefónica Germany

We would like to provide a compelling description of some little-known facts about the disaster of RMS Titanic, which sank in 1912 after colliding with an iceberg. We all know the history, or so we think. Thanks to Telefónica’s Discovery Methodology – powered by JMP – we will embark on an expedition diving deep into a sea of data points to rescue insights from oblivion. We’ll scrap data using the "Internet Open" feature. We will join, concatenate and update tables until we create an enhanced database capable of providing plenty of insights. Then, by means of ANOVAs, categorical and visual analysis we’ll tell a fascinating data-driven story. Expect a last-minute surprise! Of course, it’s going to be all hands on deck: Everything is documented and explained to be uploaded onto the JMP User Community. We would like to open a conversation for others to build on top of this. Come on board with us!

The attached files are:

  • Import Script we used to get the lifeboat information from the Titanic Encyclepedia. However, for accessing the full data one need to be registered to the Encyclopedia. Otherwise you only get basic information.
  • Our Journal and data tables in an archived JMP Project file to be used in JMP 14. Projects were introduced in JMP 14 therefore these will not open/work in JMP 13 or earlier versions. Some models saved in the file are created with JMP Pro and therefore the saved scripts for these models will not execute with JMP only.
  • There is also a .zip file for JMP 13, including the Journal and the related data tables. As for the JMP 14 some models will only run with JMP Pro, However this will have no impact on the analysis called from within the journal.



The way of presenting the story to tell goes beyond JMP but uses JMP's analysis and reports. This is becuase @Telefonica the analysis results need to be incorporated in other reports, too. To get a better impression on how an informed graph could look like based on the Titanic dataset, you might want to start reading this booklet


Life is messy, and so is data
We thought this was going to be easy: It was not. At the beginning, JMP “internet Open” data scraper  (ImportScript.jsl attached) worked fine but reality -as always- proved trickier: we discovered that the data were not consistent (between different sources) and, worst of all, that there were missing data. 

(Note: The Encyclopedia Titanica changed the download rules. Unfortunately you now have to subscribe to get the full data. Using that script only would provide Name, Age, Class/Department, and a link to the picture. Way too few information for a throrough analysis. The full dataset has much more information included, soon we will include it here).


We strive for “tidy data” that conform to the TRIPLET normal form of relational databases:

  1. Each variable forms a column.
  2. Each observation forms a row.
  3. Each type of observational unit forms a table.

Fixing data quality took months of burning the midnight oil, even with JMP's functionalities like Recode or data table tools like join, concatenate, ... . Mainly because the data itself had to be manually proofed for correct matches, as names and IDs din't always correctly match, as well some people were meant to be on board, but weren't. And our goal was to get a very solid data table as base for our further analysis.


Classical view of the titanic disaster: Rich, Women, and Children survived

Make a short list in your mind, what are the first two things which came into your mind when thinking about the titanic disaster. Probably something like the rich, women, and children survived? After the troublesome work of data prep we now can use JMP’s fantastic explorative data analysis tools to understand the key drivers for survival rate.

First we don’t see a big difference in the distribution of the age of either RIP or survived people. Lets dig in further and take a look at the class and women/children/men split. Females and Higher class have a better survival rate. Let’s dig even deeper. Not for all females and children this is true, 3rd class have a worse survival rate. Using a simple model of gender, age, wealth shows a good result and confirms this view. However, we thought this cannot be the full truth, especially we were resistant to believe this after all our work with data preparation. 



New Approach: New Variables and the mentality in those times

We added more variables to the traditional analysis, i.e. "ORIGIN". This should be taken with a grain of salt because nationality, ethnic origin and place of residence are not disjoined sets. Therefore, is a very subtile topic, specially back in those imperial times.JMP_Titanic_Origin.jpg


Another important fact we took into consideration was the mentality of that epoche, which had an incredible impact on the evacuation during this disaster. Without the fate of the engineers who sacrificed to keep Titanic running after the iceberg hit the ship it would have sank much faster and way more people would have died. Without the drilled navy officers and their mentality of following orders as strict as possible many more people might have been saved. A navy officer on the port who treated the order "Women and children first" as "Women and Children only" had impact on the number, gender and class of passengers on the port side of the boat. Lets look further in the epoche.


Blue Ribbon

As Titanic was on its maiden voyage careful attention had to be paid to the machinery, particularly the reciprocating engines and these will have gradually been run up to full power over the duration of the voyage. The full power run planned for 15 April was not an attempt to break any record but simply to check that the engines could achieve the designed power consistently. There was no chance of Titanic breaking the "Blue Riband", she was simply not powerful enough. Mauritania and Lusitania required some 75,000hp to propel their smaller tonnage at record speeds and the most that Titanic’s engines could produce was about 45,000hp.




It was suggested Murdoch on the bridge rang the engine room telegraphs to "Full astern" for both engines soon after the iceberg was sighted and by all accounts it took between 30 and 40 seconds from that time until impact. It is unlikely, therefore, that the engines were going full astern before the impact; the iceberg was too close for the engines to have any influence upon the collision. Without doubt the engines did stop but there was insufficient time for them to have any effect before the collision. The engines responded quickly to the controls as can be seen from the trials information but they were still going ahead at the time of impact as it took time for the engineers to reach the controls and then further time for the engines to react.


The Blue Riband is an unofficial accolade given to the passenger liner crossing the Atlantic Ocean in regular service with the record highest speed. The term was borrowed from horse racing and was not widely used until after 1910. Traditionally, the record is based on average speed rather than passage time because ships follow different routes. Also, eastbound and westbound speed records are reckoned separately, as the more difficult westbound record voyage, against the Gulf Stream and the prevailing weather systems, typically results in lower average speeds.


Most ships flew the Red Duster of the merchant marine. Because of his position as a Commander in the Royal Naval Reserve Captain Smith had the distinction of being able to fly the Blue Duster of the R.N.R.

Plain blue ensign was permitted to be worn by three categories of civilian vessel. Being one this some British merchant vessels whose officers and crew include a certain number of retired Royal Navy personnel or Royal Navy reservists, or are commanded by an officer of the Royal Naval Reserve in possession of a Government warrant.


HMS Dreadnought was a battleship built for the Royal Navy that revolutionised naval power. Her name and the type of the entire class of warships that was named after her stems from archaic English in which "dreadnought" means "a fearless person". Dreadnought's entry into service in 1906 represented such an advance in naval technology that its name came to be associated with an entire generation of battleships, the "dreadnoughts", as well as the class of ships named after it. The generation of ships she made obsolete became known as "pre-dreadnoughts". Admiral Sir John "Jacky" Fisher, First Sea Lord of the Board of Admiralty, is credited as the father of Dreadnought. Shortly after he assumed office, he ordered design studies for a battleship armed solely with 12-inch (305 mm) guns and a speed of 21 knots (39 km/h; 24 mph). He convened a "Committee on Designs" to evaluate the alternative designs and to assist in the detailed design work. One ancillary benefit of the Committee was that it would shield him and the Admiralty from political charges that they had not consulted leading experts before designing such a radically different battleship.

Dreadnought was the first battleship of her era to have a uniform main battery, rather than having a few large guns complemented by a heavy secondary armament of smaller guns. She was also the first capital ship to be powered by steam turbines, making her the fastest battleship in the world at the time of her completion. Her launch helped spark a naval arms race as navies around the world, particularly the German Imperial Navy, rushed to match it in the build-up to World War I.


Titanic was not meant for world record breaking even there was a rush for faster Atlantic crossing.

Titanic was under military flag because of the seargants and officers on board


New variables: Trim, Fate, Wealth, Survival_Trim

With the introduction of new variables we could change the typical categorical analysis of the titanic data into an analysis of continuous variables. Trim was used to make clear that you cannot just say Survived/RIP. It makes a difference if you were first in line of the evacuation or last. Fate shows that there are people who should have survived but sacrificed, like engineers within the passengers, priests, musicians, women with children, ...




Modeling with the new variables allows for finding models with a much better AUC (from 0.72 of the original model to 0.85). However, why couldn't we find an even better model? We look at the misclassification rates and especially the misclassified passengers who should survive based on the model but died. We found out with more informed data in XXX that many of these misclassified passengers refused to leave the boat, sacrificed, or just didn't speak English and were lost on the ship. 




Further insight you'll find in the Journal