Using Neptyne to explore the population growth of the US States
As the United States continues to grow and develop, understanding the population changes of its individual states can provide valuable insights into the country’s overall progress and demographics. In this post, we will show we can use Neptyne, the programmable spreadsheet to import the population data, refine that data and visualize the growth in an appealing way in just a few lines of Python while taking advantage of the spreadsheet that Neptyne is.
Let’s start by importing the data:
def merge_rows(*tbls):
res = defaultdict(dict)
years = set()
for tbl in tbls:
for _, row in tbl.iterrows():
for year, v in row.items():
years.add(year)
if state == "United States":
continue
res[row['Name']][year] = int(v)
return years, {state: [pops.get(year) for year in years] for state, pops in res.items()}
def load_data():
tbls = pd.read_html(
"https://en.wikipedia.org/wiki/List_of_U.S._states_and_territories_by_historical_population"
)
years, states = merge_rows(tbls[0], tbls[2], tbls[3])
Populations!B1 = [["Population", *years],
*[[state, *population] for state, population in states.items()]]
The load_data() function in the code imports population data for each US state from a Wikipedia page. It then uses the merge_rows() function to process the data and create a table containing the population data for each state. This table is stored in a sheet called “Populations” in the spreadsheet.
The merge_rows() function takes in one or more tables of data, each represented as a Pandas DataFrame, and extracts the relevant data from each row, merging rows of the same state. it returns the years and the population table.
To create a cartogram using a spreadsheet, we can use the grid structure of the spreadsheet to represent the states. We start by assigning each state to a cell in the spreadsheet by entering its state code in that cell.
To display the state population for a certain year, we can fetch the population data from the Populations sheet and set the background color of each cell to correspond with the population of that state. This allows us to visualize the state populations using the grid structure of the spreadsheet.
Matplotlib has a nice way to pick colors on a scale:
cmap = LinearSegmentedColormap.from_list(‘rg’,[“r”, “y”, “b”], N=256)
Which creates a color scale from red through yellow to blue with 256 steps. Setting up the background color of the cell then becomes:
rel_pop = int(256 * (results[code] — lmn) / (lmx — lmn))
r, g, b, _ = [int(x * 255) for x in cmap(rel_pop)]
state.set_background_color(r, g, b)
Anything we can learn? The map from 1790 above is already quite interesting. The populations are closer to each other than today and New York is not particularly dominant. A hundred years later things look different!
It’s just after the Civil War and we’re hitting peak industrialization and you can see it in the population patterns. Indiana has more people than Texas, Florida and California combined. The overall population has exploded from 3.8 million to ten times that.
Jump another 150 years to our current time and we can see a dramatic shift to the south and to the coast. California, Texas and Florida are the heavy hitters now and especially the states in the middle, up from Wyoming all the way down to Mississippi have grown a lot less than most.
If you want to play with the data or the visualization, hop on over to Neptyne — if you don’t have an account get one here.