Link to our site

Evaluating the changes in art museums

Lindsay Hardy and Emily O'Connell

Goals

For our project, we will be analyzing a dataset concerning the collection of the Museum of Modern Art (MoMA). The MoMA over the past twenty years has gone through a couple expansions, the first finishing in 2004 and the second finishing in 2019. For our analysis we mainly plan to focus on its changes that started in 2014.

In 2014 MoMA revealed its plans to expand and add 47,000 square feet to its current gallery space. This addition the MoMA said will allow for new and innovative displays and reorganization of the museum. With this expansion they had distinct goals of "diversifying the canon, embracing the present, and showing 1,000 more of its artworks." Through this the curators plan to spotlight the work of women, African American, Asian, and Latino artists who have been previously overlooked in the collection. “We feel that many, many regions that once seemed peripheral don’t seem that way any more, they seem central. Figures who once seemed secondary now seem primary,” said MoMA chief curator Ann Temkin. In order to do this the MoMA has had to actively work over the past years to diversify their collection so that they will be able to accurately meet the goals that they had laid out for themselves.

When looking at the MoMA dataset, with their goals in mind, our overarching question is exploring whether or not MoMA fulfilled its goal of expanding the diversity of its art collection. In order to see if this shift was reflected in their collection and acquisitions, we will attempt to measure the overall diversity a work adds to the collection by analyzing the relevant aspects of the artwork such as the artist's nationality, the artist’s gender, the year the work was created, the year it was acquired, and various other factors. Although it is impossible to truly quantify the abstract concept of diversity, we believe that analyzing overall patterns in acquisitions will be helpful in determining whether or not MoMA has been successful in its goal of introducing new viewpoints to its collection.

The questions that we plan to answer:

  1. Is there an overwhelming number of male artists represented?
  2. Has MoMA recently been acquiring artwork from non-American at a higher frequency than in the past?
  3. Is the artwork acquired from non-American artists from Western European countries or elsewhere?
  4. How does the average age of a work of art change in different time periods?

By answering all of these smaller questions, we will be able to make a judgement on our overall research question of assessing the diversity of the collection in recent years. We will also look at the trends over time to see if there are changes in what kind of artwork MoMA typically collects and note any differences between time periods. We have found a rich and detailed dataset with 130,262 works of art owned by MoMA acquired from 1929 to 2017. Although our dataset for the MoMA stops at 2017, but we believe that we will be able to conduct a thourough analysis of their goals, as they laid out their goals for changes in early 2014, so we will have around three years of data to work with and compare with their previous activities. So in all we will have a lot of information pertaining to the type of artwork MoMA has acquired in different time periods.

Comparison with other museums

In addition to the thorough analysis we will conduct on MoMA dataset, we also plan to compare the MoMA’s collection to other large modern/contemporary art museums like the Tate, a modern art museum in London. Through a comparison of the Tate and the MoMA we will be able to see if there are any differences in collections and trends between museums in two different countries. The Tate's collection is slightly smaller than MoMA's, only having 70,000 works, and the dataset that we have only representing 69,274 of those works. Because of this when we conduct comparisons between the two datasets we will need to be mindful of the difference in size of the two museums.

We will also investigate the collection of the Metropolitan Museum of Art, another famous art museum in New York City. Even though the Met is not specifically focused on modern art, we will be able to key into its modern/contemporary art collection and only focus specifically on those works. The number of modern/contemporary works that the Met has is much smaller than the MoMAs, only 13,452, but we believe this will still allow us some insight into the differences and similarities between the two museums. By comparing the MoMA to these other museums we will be able to see if they not only reached their goals but if their goals place them as a forward thinker on modern art and not just a part of the crowd.

MoMA, the Tate, and the Met all have public GitHub repositories and open source data that we took advantage of.

Outline

We have laid out our work flow below by starting with cleaning the MoMA data, then conducting individual analysis on it to answer our questions stated above. Then we clean the Tate data and compare it to the data from the MoMA. Afterwards we clean the Met data, and in a similar fashion to how we compared the Tate dataset, we will do the same analysis between the Met and the MoMA. Finally we have a conclusion section at the bottom stating our overall findings and what we believe the analysis has shown us.

Collaboration Plan

In terms of collaboration, we have set up a GitHub repository and plan for storing the datasets, sharing code, and version control. We intend to meet on Zoom once a week on Tuesdays at 3:10, right after class, and will also be flexible with adding a second time to meet on Thursday after class as well. During these meetings we plan to check our progress, solve problems, and ask questions, in addition to communicating outside of scheduled meetings as needed.

MoMA Data Extraction, Cleaning, and Loading

The first thing that we needed to do was unzip our files. We originally tried to pull the data from the github repo, but the MoMA has a no pull policy. So instead we downloaded the data from kaggle.

We unzipped the files and placed them in a working form in our folder

Next we read in the Artworks file and the Artists file and created a new dataframe called gender. We will be analyzing the gender and nationality of the artists so we wanted it in our main dataframe. Originally we tried to merge the two dataframes on 'Artist ID', but upon a closer look, "Artist ID" had multiple ids in the artworks dataframe but not in the artists dataframe. Some had up to 5 ids while others only had 1. Because of this we decided to just merge on "Name" because of its consistence across both dataframes. Once we merged we renamed the dataframe "moma". The resulting dataframe is made up of the works title, the artist and their background, the name of the work, information on the medium and different measurement types, when the moma acquired the work, and a couple other minor things. We are looking to analyze the dataframe to find insights on the dates that they were acquired with regard to when the works were made and if there were significant changes in the works acquired after a certain year. We think with this dataset we will be able to do all of these things because of the columns below, but will need to make sure that the columns are in the correct format.

Then we checked on the datatypes, everything looked like it was the right type of object except for the dates, so we first changed Acquisition Date

While we were checking the different values we realized that the Date, the year the artwork was created, had a lot of different problems and could not be analyzed because of its different date representations, like having 1967-1977, or c. 1989, or early 1992. Because of this we had to fix the Date column, which is shown below. Having this column be correct is very important to us because the date of creation is crucial to our analysis.

First we removed the obvious components that were incorrect, but then realized the large extent to which the dataset had issues, so we used regular expressions to extract a date, shown below, and then checked the remaining dates that were not in the correct format that we missed

Because none of these dates had any meaning to us, we could officially switch over to have the date_edit column be our official date column, which we do below

After we officially had all the values as dates we converted them to a date time

One of the main things that we will be looking at is when the MoMA acquired the work of art. Without this knowledge we won't be able to see if they really changed following their addition. Because of this we removed works that did not have a date from the original dataframe moma. We decided not to remove the date of the creation of art for right now because we believe those works could potentially be useful.

After cleaning the data we began to add in columns that we would utilize in our analysis. The first column that we planned on adding was a distinguisher of if the artist was alive when the work was acquired or not.

Next we created a column to determine if the artists was of European or Western descent.

From looking at these unique values we can see that there are a couple of ways to designate if a person had an unknown nationality, which we deal with below. But all the nationalities, if they exhist, are in English and in a correct format.

In order to tell if the person is European/Western or not, we mad a list of all the nationalities that the MoMA has that are non-European/Western nationalities.

Then we made a function for the MoMA that designated if a person was European/Western or not, created a new column, and placed it in the MoMA dataframe. When deciding whether a country was designated as European/Western or not we refered to a list of European countries from this website: https://datahub.io/opendatafortaxjustice/listofeucountries#resource-listofeucountries

After cleaning and adding in extra columns for analysis, we can now create smaller datasets that only contain related columns, with names_dates_moma being the one we will be using most often.

Before we look at other datasets we will conduct analysis on the MoMA by itself, starting with a timeline showing the acquisition of art by the MoMA of female and non-Western/European artist in recent years.

Has MoMA acquired more diverse (based on gender and nationality) works of art in recent years?

Conclusion: There is no discernable pattern in the number of diverse works that MoMA acquires each year. However, in the past 20 years, there has been an increase in years that have a high number of works of art by female or non-American artists. It looks like they were more successful in collecting works by non Americans than they were works from female artists.

Next we examine if the MoMA has done what it said it had a goal of doing: bringing the art up to the present.

Has the MoMA strived to acquire art by more recent artists?

Conclusion: Through this visualization it is evident that over time the MoMA has consistently bought more art by living artists than by deceased artists. But as time went on, starting in 2004, there were more consistent peaks in their purchasing of art, but these purchases still maintained the place of importance of purchasing art from living artists. It shows that they have continued to meet their goal and expand their collection to represent the present.

Examining National and Regional Diversity

Conclusion: This graph is not quite as straight forward as the ones before. But before the MoMA entered into the early 2000s the amount of art that they acquired from Non-American Non-Western artists was consistently dominated by the amount of art that they acquired from American or Non-American Western artists. As the MoMA entered into the early 2000s, and into 2015, it conssitently aquired more works of art, but at different stages the art that was acquired from Non-American Non-Western artists increased, and in some years topped the number of American and Non-American Western artists, which didn't happen in previous years. From this analysis we see that after the MoMA determind their goals to diversify their cannon in the early 2000s they followed through with that goal and acquired more art from Non-American Non-Western artists than they had done before.

After looking at the distribution over the years of works acquired based on gender, age, and location we now conduct analysis that shows the distribution of countries that the MoMA has art from displayed on a map, utilize a Lorenz and Gini coefficient, and then a linear regression to see how the MoMA's progress compares to a prediction of what is should have been.

Geographical Analysis

Comparing the Map of works per country pre-2012 and at the present

Using GeoPandas, we created a maps to show the number of works per country using two different time period: all works before 2012 and all works until present.

Since the MoMA DataFrame lists nationality and not country of origin, we need to convert the nationalities into the country names, so it can be merged with the existing geopandas world data. We used a demonyms dataset that included nationalities paired with their respective countries.

Here is the map of the number of works per country before the beginning of the rennovation in 2012. Let's make another map to show the works per country up until the present.

These visualizations allow us to see the number of works from each country. As we've seen, North America and Europe are very highly represented, particularly the United States, France, and Germany. On the opposite end of the spectrum, 70 countries are not represented and 17 countries are only represented by one work. The bins vary in size and scale to show how polarized the data are. Scaling by 10,000s or even 1,000s is not very revealing because there are so many countries with very few works, so we chose to define the bins in a way that would most clearly show the inequality between countries.

Comparing pre-2012 statistics with the present

Since 2012, MoMA has added art from 8 new countires, and all of them are non-Western. They did not add any works from conutires that were only represented by one work.

Lorenz Curve and Gini Coefficient

The Lorenz Curve is typically used to depict distribution of wealth, but it also has helpful applications for showing distribution over time of other variables. In this case, we used it to show the distribution of female artists over time. The x axis represents the cumulative percent of the total population and the y axis represents the cumulative proportion of female artists. In perfect equality, this would be a line with slope of 1 (y=x or x and y increase at the same rate).

In the example of distribution of wealth, perfect equality would mean that the bottom 10% of households hold 10% of the wealth, the bottom 20% has 20%, etc. In our example, perfect equality would be the first (earliest chronologically) 10% of the all of the artworks has the first 10% of all of the artwork by female artists.

In contrast to perfect equality, the Lorenz curve shows the true distribution of female artists over time, with the same x and y axis as perfect equality.

The Lorenz curve is helpful because it shows a graphical representation of inequality and allows us to calculate the Gini coefficient, which calculates a metric to shows the inequality of the proportion of art by female artists among all artworks. The Gini coefficient is the proportion of the area between perfect equality and the Lorenz curve compared to the total area under the perfect equality line.

This graph is a good representation of perfect equality, the Lorenz curve, and the Gini coefficient. image.png

For our axes, since there is not necessarily an analogoue to sorting the population least to most wealthy, we chose to sort the data from earliest acquired to most recetly acquired to show the trend over time.

Lorenz Curve

Since the Gini index depends on cumulative proportions and not counts, we need to find the total proportion of female artists for each cumulative decile. This requires us to divide the number of female artists in each cumulative decile by the total female population.

It will suffice to simply create a list of each decile value because the difference in their size is negligible.

Here is a dataframe with the cumulative proportion of artists, the cumulative proprtion of female artists, the decile start date, decile end date, and the date range. This provides some context for each decile.

Now we have a graph showing perfect equality (blue line) the Lorenz curve (orange line). The area in between the two lines represents the inequality between the two. The gap decreases in more recent years, showing that there is a more equal distribution of works by female artists in later years, particularly starting in the 8th decile (all works until 2009). Interestingly, in the 9th decile (all works until around 2015) the Lorenz curve actually slightly surpasses perfect equality. This means that the earliest 90% of all artworks contains approximately 91% of the all the works by female artists.

Gini Coefficient

To find the Gini Coefficient, we need to integrate the Lorenz curve and the perfect equality line to get the area under the Lorenz curve and the area under the perfect equality line respectively. We then use those values to find the Gini Coefficient. Mathematically, this is:

Gini Coefficient = $\frac{\int Perfect Equality Line - \int Lorenz Curve}{\int Perfect Equality Line}$

We found a Gini coefficient of 21.8. The closer a Gini coefficient is to 0, the closer the data are to perfect equality.

This means there is 21.8 percent concentration of works by female artists in the dataset over time, so works by female artists are generally acquired at lower rate than works are acquired in the general population. However, the graph of the Lorenz curve surpasses perfect equality around the 90th decile, so we can concluded that MoMA has acquired a higher proportion of works by female artists in more recent years.

It is important to remember that the Gini coefficient does not directly compare the percentage of female artists among all artists, but rather shows what percent of the art by female artists were acquired in each decile to show distribution over time

Linear regression predicting proportion of art by female artists

We used scikit learn to run a linear regression on the data in two different sets: all of the data and only data before 2012 to see if the model could accurately predict the proportion of female artists.

First, we looked at the regression for all of the data.

For the next linear regression, we only provided the model with the data up to but not including 2012 to see what it's predictions would be for 2012 onward.

The points represent the proportion of female artists for a given year. The organge line is the best fit line found with the liner regression that represents the expected value for each year. The red line is the section that we used to test the line with data that we knew but did not include in the dataframe that the linear regression is based on. The green segment represents the projected future data until 2025.

In 2012, the regression line underestimates the proportion of female artists, but for all of the other known years after 2012, it overestimates it, with the exception of 2017 which is more or less predected accurately.

While the line predicted a general fairly solid increase in the proportion, from 2012 onward the actual data trended downward. While there is still a positive cooreletion overall, this does support our findings that the MoMA could have acquired more works be female artists to make an even greater difference.

Both datasets showed a fairly strong positive correlation between year and gender, meaning that the proportion of female artists increases as year increases. This helps support the claim that MoMA did improve gender diversity, but it does not necessarily mean causation. However, the more recent years actually caused the correlation to go down slightly, showing that MoMA not consistently increasing the proportion. In the last year for which we have data, the percentage did increase from the previous year.

Tate Data Extraction, Cleaning, and Loading

In order to do through analysis we should compare the MoMA's data to data from other museums. The first museum that we will examine is the Tate, another museum for modern and contemporary art located in London, England.

After examining the data set you can see they have a lot of different columns that are not needed because of their repetition, or because we won't be using them in the future. So we created new data sets without them below.

Next we merged the artist and the artworks data together, in the artist dataset the artist column is called artists, and in the artworks it is called name. We merged this on left so that if there is an artist who doesn't have an artwork they don't show up.

Then we dropped the name column because we would no longer need it

Then, like we did for the MoMA, we removed the rows where the acquisition year was not present.

When examining the dates we noticed that they were not as wonky as the other one so we only had to change two dates.

Next we made three new columns similar to the ones we made for the MoMA, one that is the age of the person when they made the work, one that will tell if a person was alive or not when the work was acquired by the Tate, and one representing the person's nationality

Finally we utilized the same function and apply method as we did above to make a column determining if the artist was alive or not.

Next, before we could start making a new column for the Tate to represent nationality we noticed that the placeOfBirth column contained both the city and the country for the majority of the artists. We realized that in order to analyze the country that the artists were from we needed to separate the city from the country, so we made a new column called countryOfOrigin that contained just the country.

The names of the countries for the Tate dataset were especially difficult to work with beacuse of the number of differences in the name titles. The majority of the names were in the countries native language or there was a city name rather than a country, which you can see below. Because of this we had to manually go through and check for what we wanted by looking at the unique values and then adding them to the list.

First we replaced "nan" with np.nan and then we created a new dataframe that didn't contain any np.nans for country of origin

Next we created a list of the countries, a function to make a new column and insert true or false if it was a european country or not.

After each of those steps our dataframe is now clean and in the format that we want it in, which is displayed below.

Analysis: Tate vs MoMA

Now that the datasets are in similar formats we can start comparisons and anaylsis. But, the Tate data stopped collecting in 2013, so need to make MoMA align with them.

How does the diversity of the two museums compare?

Comparing Western vs non-Western Art in each museum

The first visualization that we made, compares the diversity of the artists at both the Tate and the MoMA by looking at how many artists are Western or non-Western.

Conclusion: This comparison shows that the MoMA, even before its renovation, was ahead of the Tate with regard to artists who were not of European or Western descent.

Comparing Art by Male vs Female artists in each museum

Next we chose to look at the differences in sizes of the male and female population at both the Tate and the MoMA, as this was another area that the MoMA was striving the improve.

Conclusion: This is an interesting comparison, as it once again shows that the MoMA has a greater diversity than the Tate did before its renovation, as it has a much larger proportion of female artists to male artists.

Comparing Art by Male vs Female artists of non-Western art in each museum

What about the male/female ratio when looking at non-European/Western artists?

Conclusion: The data shows that MoMA has a larger of female non-Western/european artists than the Tate, but this difference is smaller than the one above.

Do countries favor their own art over foreign art? Could this be a bias towards American or British art across many museums?

We next examined the proportion of British works in the Tate, the proportion of American works in the MoMA, the proportion of American works in the Tate, and the proportion of British works in MoMA to see if the museums tended to collect more work created by artists from their respective countries or if there was a significant bias towards American art in the Tate or British art in MoMA.

Conclusion: It looks like both MoMA and the Tate favor art from their home country, with the Tate having a higher percentage of British art than MoMA has American art. There is roughly the same proportion of British art in MoMA as there is American art in the Tate.

Examining gender and home country bias in the two museums

How does the gender ratio look when examining art from the museum's home country?

Conclusion: Both museums favored male artists, but MoMA has a higher proportion of non-American art than it has American art by male artists, while the Tate heavily favors British male artists. In terms of female artists, MoMA favors American over non-American artists while the Tate doesn't have a distinct difference between British and non-British female artists.

Met Data Extraction, Cleaning, and Loading

The second museum that we are comparing to the MoMA is the Metropolitan Museum of Art (the Met), which offers a lot of open source resources pertaining to its collection. We extracted the data from the .csv that is located on their GitHub page. To download it, you must use GitHub Large File Storage because the dataset is rather large. Here's more information.

It was a little more difficult to work with than the previous two datasets because of its sized. The file is too large to be posted on our GitHub or viewed on the Met's GitHub.

The Met dataset has a lot of columns that we don't need to compare with the MoMA, so we drop them below

The Met as a museum itself contains multiple differnt departments and isn't only modern and contemporary art, so we filter based on department to only look at the modern and contemporary art that the Met has.

Next we drop the rows that don't have an Accession year, and filter for the years that are before 2017, as the MoMA dataset only goes up to 2017 but the Met is frequently updated and has works acquired in 2020 in its dataset.

After filtering we then looked at different columns that we had noticed original problems with, for instance, Artist Nationality, which is displayed below, sometimes has multiple nationalities in the column that are divided with a '|' and then on the other side of the '|' there are sometimes even more nationalities sepearted by a ','. When examining the columns we noticed that the majority of the time the first nationality was the one that they were born with while the other ones were the country that they died in or lived their later life in. Because of this we chose to pick the first nationality every time.

Next we tried to clean the values in the Artist Gender column, but upon further inspection the column doesn't contain any usable information. It is a mix of | and only Female, which although it would be interesting to learn that the Met only had modern and contemporary works of art by female artists we know that this is not true. We displayed the different unique values, and the mess that is their Artist Gender column. Because of this we will not be able to do the same analysis of the Met vs the MoMA on gender like we did above for the MoMA vs the Tate.

Analysis: Met vs MoMA

Unlike the Tate the Met doesn't have a usable gender column so we won't be able to conduct quite as much analysis as we did with the Tate, but we can still compare the diversity of the two museums based on the nationality of the artists.

Comparing Western vs non-Western Art in each museum

First we take out the colums where there isn't a nationality for the artist.

Then we examine the different nationalities and compire the ones that are Western nationalities into a list.

With this list we can filter and create a new column like we did before for the Tate and the MoMA to determine if the artist is European or not.

Next we can create a visualization comparing the MoMA to the Met like we did for the Tate

Conclusion: This comparison shows that the MoMA, even before its renovation, was ahead of the Met with regard to artists of modern and contemporary art who were not of European or Western descent.

Proportion of American vs Non American Works for the MoMA and the Met

One final comparison that we can look at is how the proportion of American vs Non-American differs between the MoMA and the Met as two large American museums.

Conclusion: The MoMA out preforms the Met in diversity by having a larger number of Non-American works to American works. This shows that the MoMA as a museum places more emphasis on acquiring and retaining works from outside of America and more representational of the world as a whole. Through this, when compared to the Met, the MoMA is keeping their goal of diversifying the canon.

Final Conclusion

We feel confident in saying that MoMA has accomplised their goal of increasing diversity. While we don't have all of the acquisitions up to the end of the renovation in 2019, we are confident based on trends and predictions that MoMA did a good job of diversifying their collection. Patterns in the data suggest that MoMA is becoming more diverse in the key areas they defined: gender, nationality, and modernity of works.

However, there were some limitations of our analysis. There was not a lot of personal information about the artists, which limited the amount of features we were able to examine. For example, we do not have information about whether or not they acquired works from more LGBTQ artists or artists of different socieoeconomic backgrounds. While we did have information about the artist's nationality, nationality can be conplex and definied in different ways. Nationality could mean the artist's country of origin or the country where the artist resides, or an artist could identify with multiple nationalities.

Additionally, nationality is not necessarily an indication of the artist's race or ethnicity. Using nationality would not make a distinction between Black Americans, Asian Americans, Indiginous Americans, and Hispanic and Latino Americans, which is not reflective of the actual diversity of the American artists.

As we previously mentioned, we do not have the most recent data from the final years of the rennovation, so we cannot see the final statistics, but we still feel confident based on the intent and numbers from the data we have.

The comparison with both the Tate and the Met further confirms MoMA's commitment to diversity. Even before the second rennovation was completed, MoMA already surpassed the Tate and the Met in the key categories of diversity that we were able to measure. The range of its collection show MoMA's commitment to truly expanding the canon and including underrepresented demographics. That being said, MoMA, like the other two museums, was still dominated by works by western male artists. Comparing the acquisitions of female/non-Western artists compared to all works acquired for a single year gives us a good picture of the types of works MoMA has been collected in recent years, but looking at the overall collection, there is still a lot of work to be done to be truly representative and diverse.