Data Viz Workshop

Tutorial One

Consumer Financial Protection Bureau

Sample Data View on GitHub

Getting Started

In this tutorial, we'll begin exploring the CFPB dataset using a fantastic data visualization tool developed by Density Design. Called RAW, the data visualization tool allows you to simply cut and paste CSV data into a window, select the data visualization you'd like to design, and through simple drag-and-drop functionality develop the visualization further using variables of your choice.

RAW

RAW is particularly useful because it not only is a great introduction to many data visualizations we see more frequently in the news and elsewhere, but because it also provides you with embeddable code. Working with the code yourself is a fantastic and easy way to better understand how data visualization libraries work (in this case D3) - you can practice modifying the raw code yourself and always return to the visualization to see how changes impact the final design.

When you have time, you can see RAW in action here:

Step One: Getting the Data

Now that we know what tool we're going to use for this tutorial, let's go to the CFPB website and get some data! Open your web browser, and go to http://www.consumerfinance.gov/complaintdatabase/. Scroll down the page until you see the button that reads "Download Options and API".

CFPB Data

Important Note: The CFPB website already provides some pretty great data visualization tools through the Socrata application. If you wanted to, you could stop here and simply click the "View Complaint Data" button to begin exploring the data. Since we want to not only demonstrate the richness and flexibility of the data on the site but also ways that you can visualize the information outside of the embedded tools, we're going to continue manually downloading the data and visualizing it elsewhere.

Click on the "Download Options and API" button. In the next section you'll be presented with an option to download the actual, raw data itself.

CFPB Data

For this tutorial, we're going to work a bit with the student loan data. On the screen, select the "Student Loan" radio button, then from the first download option select "Only complaints with a consumer narrative", and finally make sure the file format is "CSV". This will provide us with a file that is both easy to work with in MS Excel as well as RAW. Click the download button.

Now that we have our data (check your downloads folder to make sure it's there), go ahead and open the dataset. If you have access to MS Excel, you can first open the program, then go to File --> Open, navigate to the file, and make sure you enable "All Files" in the dropdown menu (otherwise MS Excel may behave as though it can't open the CSV, which it actually can).

CFPB Data

Excellent! Move the dataset to a good working folder location that you'll remember for this project, and you're ready to proceed to step two.

Step Two: Setting up the Data

Now that we have our dataset, let's continue! With the dataset open in MS Excel or other application, let's take note of a few things. First, this dataset is huge! With over 12,000 records going back to March 1st, 2012, we have a lot of data to look at. While there are data visualization methods that work well with extremely large datasets, we will want to use a subset of these data. Since we're doing some exploratory data visualization here, we're going to make some choices about what we want to begin looking at without the benefit of a broader analytical framework. For this exercise that's ok - but we would want to check our data and methods if for example, we were making decisions on policy. Data visualizations are powerful, so we do need to take care to make sure we're using the data correctly.

Let's start by taking a subset of the data to begin with. Remember that we're already working with a subset of the student loan data too - we've selected only records with a complaint narrative. Next, let's narrow the dataset down even further.

First, select the column entitled "Issue". In MS Excel you'll note that under the data tab you have the option to apply a filter. Go ahead and do that, filtering out any records other than records that say "Can't repay my loan".

Let's also only look at records since the beginning of the year. Our data is already sorted by date, so go to the last record that has a date of 1/1/15 and select the record by clicking on the row number. Next, go back to the top of the dataset, and while holding down the shift key, select row number 1. You should now have a selection of only the records where "Issue" is "Can't repay my loan" and that were made since the beginning of the year. Copy the data and paste it into a new sheet.

Step Three: Working with RAW

Now that we have the data we want, let's use RAW to start visualizing our data. Open a new browser tab and go to http://raw.densitydesign.org. Next, click on the "Use it Now!" button. We're presented with a nice step-by-step walk-thru on how to begin building your data visualization. If you just want to start with one of the sample datasets provided, select from the "choose one of our samples" pulldown menu, and select a dataset of your choice to begin. If you're ready to jump right in using the student loan data we created, simply cut and paste the data into the input box below from your spreadsheet.

CFPB Data

If the data was read correctly, you'll see a little thumbs up message at the bottom of the window informing you that the records were successfully parsed. Next, scroll down and select a data visualization type. You'll want to select a data visualization that works well with your data. This step might involve a bit of trial and error, but it provides a fantastic way to understand how data visualizations work. Let's start with one that allows us to see how the "Can't Repay My Loan" complaint breaks down by company. This might provide some interesting insight into where we see these types of complaints coming from (IMPORTANT NOTE: we haven't normalized anything here yet, so it's quite possible that some of the companies with the most complaints are also the companies with the most student loans. So, before going forward, we'd want to do some further analysis). Leave the first choice selected (Alluvial Diagram), and scroll down.

CFPB Data

Next, scroll down to the "Map Your Dimensions" section of the tool. Drag "Issue" and "Company" under Steps.

CFPB Data

Finally, customize your visualization. Set the width to 700, height to 1000 and the node width to 10. Leave the other settings as they are.

CFPB Data

If you'd like to save your data visualization, download the graphic as a PNG file. Try placing the file in a document or presentation - or, you can cut and paste the code directly into a web page:

Access Group 1Access GroupACS Education Services 13ACS Education ServicesAES/PHEAA 58AES/PHEAAAsset Recovery Solutions, LLC 4Asset Recovery Solutions, LLCBlitt and Gaines, P.C. 1Blitt and Gaines, P.C.Caine & Weiner Co. Inc. 1Caine & Weiner Co. Inc.Capital Management Services, LP 1Capital Management Services, LPCitibank 8CitibankCoast Professional, Inc. 1Coast Professional, Inc.ConServe 2ConServeDiscover 30DiscoverDominion Law Associates, P.L.L.C. 1Dominion Law Associates, P.L.L.C.ECMC Group, Inc. 6ECMC Group, Inc.Education Management Corporation 1Education Management CorporationFinancial Asset Management Systems, Inc. 2Financial Asset Management Systems, Inc.First Associates Loan Servicing LLC 19First Associates Loan Servicing LLCGenesis Lending 86Genesis LendingGreat Lakes 4Great LakesHeartland Payment Systems 1Heartland Payment SystemsHigher Education Student Assistance Authority (HESAA) 11Higher Education Student Assistance Authority (HESAA)Iowa Student Loan 1Iowa Student LoanJPMorgan Chase 33JPMorgan ChaseKeyBank NA 11KeyBank NALazega & Johanson LLC 2Lazega & Johanson LLCLendKey Technologies, Inc. 1LendKey Technologies, Inc.Loan To Learn 3Loan To LearnLTD Financial Services, L.P. 1LTD Financial Services, L.P.MEFA 2MEFAMOHELA 2MOHELAMRS BPO, L.L.C. 1MRS BPO, L.L.C.National Enterprise Systems, Inc. 5National Enterprise Systems, Inc.Navient 258NavientNelnet 5NelnetPatenaude & Felix APC 1Patenaude & Felix APCPNC Bank 5PNC BankRBS Citizens 1RBS CitizensRhode Island Student Loan Authority 1Rhode Island Student Loan AuthoritySallie Mae 28Sallie MaeSecond Alliance, Inc. 1Second Alliance, Inc.SIMM Associates, Inc. 2SIMM Associates, Inc.Solomon and Solomon, P.C. 2Solomon and Solomon, P.C.Student Assistance Foundation 1Student Assistance FoundationStudent Loan Finance Corporation 3Student Loan Finance CorporationSunTrust Bank 1SunTrust BankTexas Guaranteed 1Texas GuaranteedTexas Higher Education Coordinating Board 2Texas Higher Education Coordinating BoardTransworld Systems Inc. 16Transworld Systems Inc.Tuition Options LLC 3Tuition Options LLCU.S. Bancorp 3U.S. BancorpURS Holding, LLC 3URS Holding, LLCWells Fargo 42Wells FargoWeltman, Weinberg & Reis 2Weltman, Weinberg & ReisCan't repay my loan 694Can't repay my loan