FOIA Social Network Prototype Using Neo4J
Articles Blog

FOIA Social Network Prototype Using Neo4J

Okay so this is the beginnings of a prototype to visually map FOIA investigations by reporters about people in city government using a data set from the City of Chicago data portal and using a program called Neo4J which creates a graph database and also does these pretty fun, amazing visualizations. And this is just the very early stages (and I’ve only added in a few layers of information in the social network), but I will show you what’s going on right here. So the pink are reporters A. Castillo, J. Wise, M. Scarpace, and the purple are politicians and a few other entities that came out of me scraping proper names out of the …extracting proper names… out of the spreadsheets that that helped us for information. So here you see “City Colleges”, and here you see “Latina Affairs”. Here is the “Advisory Council”. So obviously there’s a little bit more work I need to do with getting this proper name extractor working properly. And so you can see also here these little vectors– I will expand this one out– “investigated by”: so Richard Mell was investigated by A. Castillo. Rahm Emanuel was investigated by A. Castillo. Rahm Emanuel was also investigated by Jay Wise. And so it’s nice, they give you directionality you can put directionality into these vectors. Then, so if we move around, you can see some reporters have been investigating a lot of different people, …really busy people, really busy reporters… and you can also see, as I’ve only added in some of the people in the FOIA request so far, you can see this one actually “Financial Interests” which will ultimately not be an entity in the database, but I have to add those to my stopwords to take them out of the proper name extractor. But 4 different reporters put in FOIA requests for “Financial Interests”, and so you can see some of these connections. You can also see here “R. Emmanuel” this shows another problem in the proper name extractor that there’s an entity matching problem so Rahm Emmanuel and R. Emmanuel need to get aligned into the same the same database entry. So it’s pretty interesting to see these interweaving paths of reporters and people that they’re FOIAing. Then the FOIA data is in three different columns in the spreadsheet one is politicians and then there’s two other columns. And you can see that actually I haven’t had a chance to map the names on to some of these other clusters, and those are from the other columns of data. So you can see that there are some other clusters in the network and lets see if I can find one that’s really big. Oh here is one of the really big clusters. Here 816. So I have to figure out who that is. It’s pretty fascinating! A fascinating project that I’m super excited to do a lot more with. Oh here is the really big one. So there’s the really, really big one. I’m curious who 802 is because this is a reporter who has been doing a lot, a lot of work investigating all sorts of people. Then I also can add in for each reporter who they work for and the way I can do that is that I have all the organizations, those are in another column, and I can map to this “Works For” vector and add that in so that we have who the reporter worked for and then they’ll be connections between the organizations that they work for and the different reporters and then who is getting investigated. So it’s really fascinating stuff that I’m excited to do more work with.

Leave a Reply

Your email address will not be published. Required fields are marked *

Back To Top