Mapping MLBAM Player IDs to the Lahman Database

After Downloading Gameday Data, I wanted to make a short post about translating the Lahman database into JSON. The purpose is so that I can compare season stats from Lahman with at-bat outcomes from MLB Gameday.

If you just want to download the JSON translations, check out JSONLahman on GitHub. The


Here’s an example of what a player’s entry looks like:


The properties with format “KEY_XYZ” are the player IDs from a variety of websites. The ones that I’m using are KEY_MLBAM and KEY_LAHMAN but I’ve included Fangraphs, BBRef, and Retrosheet for the fantasy baseball enthusiasts.

A Bit of Java Code

The first step was to get a list of all players in the MLBAM database. The Chadwick Bureau has an exhaustive and reliable list of these players, but unfortunately doesn’t include the Lahman player IDs in their database. I then wrote a bit of R code that merges this csv file with the Master.csv file from Lahman based on the Retrosheet ID.

people <- read.csv('people.csv')
master <- read.csv('master.csv')
people <- people[, c("key_mlbam", "key_bbref", "key_retro", "key_fangraphs")]
master <- master[, c("retroID", "throws", "bats", "nameFirst", "nameLast", "key_lahman", "birthYear", "birthMonth", "birthDay")]
merged <- merge(x=people, y=master, by.x = "key_retro", by.y = "retroID")
write.csv(merged, file="merged.csv")

I then imported the CSV into Razor SQL to build a Derby database that I can embed into a Java application. I could have used SQLite but I love Apache’s Java libraries.

ResultSet res = stmt.executeQuery("SELECT * FROM PLAYERS");
 File file = new File(baseLocalURL+"\\players.json");
 writeTableToFile(res, file);

You can see all the code in the GitHub repo linked above. It’s not the best code but the goal was just to get the output.