Introducing Critical Discourse Analysis


In the remaining section of the guide I look into conducting a Critical Discourse Analysis of the qualitative data available in the data sources we worked with - namely, the description of riot events from The Guardian’s dataset, and responses in the data source representing the perspectives of the participants of 2011 riots. Such analysis will allow us to computationally categorize entities (words) used in these records by different types of sentiment (positive, negative, and neutral), helping us to infer potential meanings and biases that such data sources might be imposing on their readers. I continue to utilize Google Data Studio for interactive visualizations of such analysis, and I connect the data source in Google Sheets to Google NLP API and Apps Script to run such analysis.

Analyzing Data Sources with CDA


I continue working with The Guardian’s dataset on the 2011 riot incidents in the United Kingdom, building off our initial transformation of its data, and extension our representation of it in Google Data Studio, but this time focusing more specifically on quantitatively and qualitatively analyzing entries in the “Event Description” variable. By doing so we start implementing the Critical Discourse Analysis, which allows us to identify some trends in our data, the positive, negative, and neutral sentiments related to the words used in “Event Description,” different words’ frequencies. We more specifically focus on two levels of analysis for our CDA:

(1) Entity Sentiment Analysis for all entries in the EventDescription column (N of records = 245)

(2) Entity Sentiment Analysis for all entries in the EventDescription column, grouped by their “Authority” (N of records = 42, with each including a merged “EventDescription” entry; we remove the single authority with “Missing Data”)

I propose conducting CDA, as it can be an especially critical approach to analyze textual data, identify patterns that might not have been as evident when looking at the data source itself and representing it in Data Studio, and as such patterns can be interpreted quantitatively, allowing us to start forming and comparing inferences about particular words used in narrating the events in specific “Authority,” as well as in narrating the issue of riots in the United Kingdom by The Guardian in its nationwide project we drew this data from.

Looking back at some of most prominent scholarship of CDA in the context of histories of riots, in “Racism and the Press” Dijk outlined such an approach by examining quantitatively (see 1-3) and qualitatively (4-6):[87]

(1) frequencies of words in the headlines of 5 British newspapers (following the events of 1981, with data from articles published between 1981 and 1986)

(2) frequencies of historical actors in such headlines

(3) frequencies of categorical relations identified between such actors

(4) the definitions and connotations of most frequent words such as “riots”

(5) possible relationships between connotations of (1) - (3) and their ideological implications (constituted by pre-assigned socio-political positions of such newspapers)

I build on Van Dijk’s approaches to conducting CDA in the context of The Guardian’s data on riots of 2011. By doing so we should note that we assume that there is a high level of similarity between the “headlines” type of data and the “EventDescription” data, as both intend to briefly describe the events - in a form of a short summary; we also assume that the “EventDescription” extends the usability of headlines data, as it also includes records that describe the events in multiple sentences, generally providing more information on the events that could represent a more complex narrative of such events.

CDA Implementation with Google Cloud Natural Language API


Considering the functionality of the Google Cloud Platform and Data Studio, such as being able to ingest new data sources (of analyzed textual data in sentiment analysis) and present them via interactive dashboards like the previous one, we proceed to utilize its Natural Language API and App Scripts to more critically investigate the data in the EventDescription. We do so in general (across all entries of the dataset), and as specific to a particular “Authority,” by benefitting from the following functionality:
  1. Creating an API for the Natural Language Processing process
  2. Running a script via Apps Script to implement the Entity Sentiment Analysis of our data
  3. Ingesting such data source into a Google Data Studio dashboard
  4. Interactively representing the NLP data from Entity Sentiment Analysis
    1. Sorting the words by their frequencies in text
    2. Filtering by Sentiment Score (lower than 0 - “negative,” equal to 0 - “neutral,” larger than 0 - “positive”)
    3. Utilizing the “Filter” to look up specific words in the “EventDescription”
    4. Filtering by “Authority” name for comparison of localities

We build on and extend the existing documentation steps on Google NL API (Analyzing text in a Google Sheet using Cloud Natural Language API and Apps Script and Connect to an API: Analyze feedback sentiment | Apps Script | Google Developers) to implement the CDA approach in the context of our project. To make sure we do so successfully, we need to fulfill the following steps:
  1. Set up the Google Cloud project environment via Cloud console (“Select a project,” such as the one we created for ingesting data into the Data Studio): Google Cloud Console Project Selector
  2. Enable the Google Cloud Natural Language API: Google Cloud Console APIs Enable Flow
  3. Configure the OAuth consent: Google Cloud Console APIs Credentials Consent
  4. Get an API key for the Google Cloud Natural Language API: Google Cloud Console APIs Credentials

Once these settings are configured, we need to set up and run the code script to analyze the textual data with the help of Google Cloud’s Natural Language API and Apps Script:
  1. Create the Apps Script Project using the following Google Sheets with Apps Script template: Google Sheets Link
  2. Add our project text data in “Comments” column of the Sheet
  3. Run the Script using “Sentiment Tools” - “Mark entities and sentiment”

The code script is pasted, commented, and referenced below; we adapt the code from documentation from the works of Alicia Williams in Analyzing text in a Google Sheet using Cloud Natural Language API and Apps Script (blogpost) and  Analyzing Text Data with Google Sheets and Cloud Natural Language (Cloud Next '18) (video):

// To learn how to use this script, refer to the documentation:
// https://developers.google.com/apps-script/samples/automations/feedback-sentiment-analysis

/*
Copyright 2022 Google LLC

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

  https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

// Sets API key for accessing Cloud Natural Language API.
const myApiKey = 'AIzaSyB1fU4GTQYGFLhJFnz6Jq2bqldesxbg5MU'; // Replace with your API key.

// Matches column names in Review Data sheet to variables.
let COLUMN_NAME = {
COMMENTS: 'comments',
ENTITY: 'entity_sentiment',
ID: 'id'
};

/**
* Creates a Demo menu in Google Spreadsheets.
*/
function onOpen() {
SpreadsheetApp.getUi()
  .createMenu('Sentiment Tools')
  .addItem('Mark entities and sentiment', 'markEntitySentiment')
  .addToUi();
};

/**
* Analyzes entities and sentiment for each comment in
* Review Data sheet and copies results into the
* Entity Sentiment Data sheet.
*/
function markEntitySentiment() {
// Sets variables for "Review Data" sheet
let ss = SpreadsheetApp.getActiveSpreadsheet();
let dataSheet = ss.getSheetByName('Review Data');
let rows = dataSheet.getDataRange();
let numRows = rows.getNumRows();
let values = rows.getValues();
let headerRow = values[0];
  // Checks to see if "Entity Sentiment Data" sheet is present, and
// if not, creates a new sheet and sets the header row.
let entitySheet = ss.getSheetByName('Entity Sentiment Data');
if (entitySheet == null) {
  ss.insertSheet('Entity Sentiment Data');
  let entitySheet = ss.getSheetByName('Entity Sentiment Data');
  let esHeaderRange = entitySheet.getRange(1,1,1,6);
  let esHeader = [['Review ID','Entity','Salience','Sentiment Score',
                  'Sentiment Magnitude','Number of mentions']];
  esHeaderRange.setValues(esHeader);
};
  // Finds the column index for comments, language_detected,
// and comments_english columns.
let textColumnIdx = headerRow.indexOf(COLUMN_NAME.COMMENTS);
let entityColumnIdx = headerRow.indexOf(COLUMN_NAME.ENTITY);
let idColumnIdx = headerRow.indexOf(COLUMN_NAME.ID);
if (entityColumnIdx == -1) {
  Browser.msgBox("Error: Could not find the column named " + COLUMN_NAME.ENTITY +
                  ". Please create an empty column with header "entity_sentiment" on the Review Data tab.");
  return; // bail
};
  ss.toast("Analyzing entities and sentiment...");
for (let i = 0; i < numRows; ++i) {
  let value = values[i];
  let commentEnCellVal = value[textColumnIdx];
  let entityCellVal = value[entityColumnIdx];
  let reviewId = value[idColumnIdx];
 
  // Calls retrieveEntitySentiment function for each row that has a comment
  // and also an empty entity_sentiment cell value.
  if(commentEnCellVal && !entityCellVal) {
      let nlData = retrieveEntitySentiment(commentEnCellVal);
      // Pastes each entity and sentiment score into Entity Sentiment Data sheet.
      let newValues = []
      for (let entity in nlData.entities) {
        entity = nlData.entities [entity];
        let row = [reviewId, entity.name, entity.salience, entity.sentiment.score,
                    entity.sentiment.magnitude, entity.mentions.length
                  ];
        newValues.push(row);
      }
    if(newValues.length) {
      entitySheet.getRange(entitySheet.getLastRow() + 1, 1, newValues.length, newValues[0].length).setValues(newValues);
    }
      // Pastes "complete" into entity_sentiment column to denote completion of NL API call.
      dataSheet.getRange(i+1, entityColumnIdx+1).setValue("complete");
    }
  }
};

/**
* Calls the Cloud Natural Language API with a string of text to analyze
* entities and sentiment present in the string.
* @param {String} the string for entity sentiment analysis
* @return {Object} the entities and related sentiment present in the string
*/
function retrieveEntitySentiment (line) {
let apiKey = myApiKey;
let apiEndpoint = 'https://language.googleapis.com/v1/documents:analyzeEntitySentiment?key=' + apiKey;
// Creates a JSON request, with text string, language, type and encoding
let nlData = {
  document: {
    language: 'en-us',
    type: 'PLAIN_TEXT',
    content: line
  },
  encodingType: 'UTF8'
};
// Packages all of the options and the data together for the API call.
let nlOptions = {
  method : 'post',
  contentType: 'application/json',
  payload : JSON.stringify(nlData)
};
// Makes the API call.
let response = UrlFetchApp.fetch(apiEndpoint, nlOptions);
return JSON.parse(response);
};

Connecting the Entity Sentiment Analysis data source to Google Data Studio





Figure 17. Current view of the Entity Sentiment Analysis page of the Dashboard,
for the Event “Description” Variable (Link).


I ingest the new data source into Google Data Studio (by using “Add Data” and connecting the Google Sheet). We visualize the data in a tabular form, presenting the records by negative (SC < 0), positive (SC > 0, and neutral (SC = 0) sentiment score assignments. Figure 17 presents how I visualize Entity Sentiment Analysis in the dashboard. I extend the same script code and ESA approach to the data with interviews of 2011 riots participants (Figure 18), identifying the sentiment within a data source that differs from The Guardian’s one, as it covers the motivations and potential causes behind the riots of 2011.



Figure 18. Current view of the Entity Sentiment Analysis page of the Dashboard, for the “Interview Response” Variable (Link).

Footnotes

[87] Teun A. van Dijk, “Racism and the Press,” Critical Studies in Mass Communication 8, no. 2 (June 1991): 183–98, https://doi.org/10.1080/15295039109366705.