When working with big data, especially open government data, we tend to assume things will be easy. You just go to the parliament's website and download whatever they've posted or, if you wanna get fancy, you do some programming with their API and "presto!" pretty, ready-made data.
But, alas, nope. When has the world ever been that easy? When has science ever been that easy? When has political science ever been that easy? Rule of thumb, if it's easy, it's probably wrong.
I guess people on the providing side of data don't really work with that data, not like we do. And they don't consider the many ways in which we might need and use the data. Which means that I've been having a lot of issues with standardisation of data. For each type of legislative action I want to look into, I get different uses of names, for instance: LASTNAME, NAME; NAME, LASTNAME; TITLE (lady, sir, baroness, viscount) OF WHATEVER; HONORIFIC LASTNAME (Mr, Miss, Ms, Mrs, Dr). Moreover, these aren't standardised to each person. One example is MP Dan Poulter (Conservative), who sometimes appears as "Dan Poulter", sometimes as "Dr Poulter", sometimes as "Poulter, Dan", and to make matters worse, "Poulter, Dr". John Thurso (LibDem) will appear both as "Viscount Thurso", "Thurso, Viscount", "John Thurso", and "Thurso, John". And, to add insult to injury, these are not standardised within files. They may appear differently in the same file, but meaning the same person.
And there are also many, many, many missing values, that is, MPs without parties assigned to them. For those, I went to the oracle, excuse me, I did a Google search and filled in the blanks. Over 200 MPs.
In addition, there is only one file with data on the MPs: the members file. Which means that any other database you download or create (in R, with the hansard package created by the wonderful Evan Odell) will probably not have party and gender.
After A LOT of cleaning up (using Excel and its handy formulas) and putting together a file with MPs, parties, and gender to use as a base, I've decided to share that file here. Anyone is welcome to use it. Would a citation be nice? Sure. Mandatory? Meh. There are nearly 900 MPs compiled here - but since my own research is restricted up to 2017, the recent defectors from Labour are still categorised as either Labour or Labour (Co-op). Anyone who switched parties is categorised as the party they were in the longest or the party they were in when in Parliament. As common usage, men are 0 and women are 1. I've tried to remove all titles and honorifics and I hope I have. I truly hate them.
After downloading this, you can use an OR formula on Excel to categorise your data; note that Excel will only accept 250 arguments (at least the 2016 version), so you'll have to do it in chunks. (E.g. =OR(A2=$B$2; A2=$B$3; A2=$B$4; A2=$B$5)=TRUE)