Thursday, April 30, 2020

A note on minimizing common problems reporting trends in Covid-19 statistics

Some of the most widely-reported statistics about Covid-19 are based on daily updates of total "new" cases in the previous 24-hours. These updates are used to track increases and decreases in everything from the number of tests to the number of deaths from the pandemic.

But these daily totals can be misleading. The day that a case is first reported may not be the day the case actually occurred. Cases are often reported days or weeks after their occurrence because of lags in the collection and distribution of Covid-19 data.

Reports that use the actual day a case occurred are a better way to measure trends. But  day of reports can also be misleading because of lags in reporting cases.

The problems created by lags are especially apparent in many popular dashboards that use graphics to illustrate Covid-19 trends.

The Ohio Department of Health is an exception. The department's dashboard features clear presentations of Covid-19 trends that include the limitations of the data.

The Ohio data is updated every day at 2 p.m. I used data from multiple updates to produce my own graphics and analysis for this post. I have no affiliation with the Ohio Department of Health.

     

I am using Ohio deaths as an example of general problems reporting Covid-19 tests, cases, hospitalizations and other measures. Covid-19 trends are often reported with bar charts, so I will do the same.

My first two charts (above) show total "new" deaths reported each day in orange. The blue chart shows days that each death actually occurred. Both charts illustrate the period from the first Ohio coronavirus death to April 21.

The orange chart shows that March 20 is the first day that "new" Ohio deaths were reported. But the blue chart shows the first death actually occurred on March 17.

You can also see the orange distribution of "new" deaths does not accurately depict the actual  distribution of deaths in blue.

The number of "new" deaths being reported appeared to be increasing on April 21. But the blue chart shows that deaths per day appeared to be decreasing.


These next charts (above) include data from eight more days, extending the analysis from April 21 to April 29. The orange chart includes a dramatic spike on April 29 because 138 "new" deaths were first reported on that day.

The spike is misleading. The blue chart includes these "new" deaths on days the deaths actually occurred. The blue chart again shows the actual number of deaths per day appear to be decreasing.

But the blue chart also uses lagged data, making it incomplete and possibly misleading. Recall the decline in my first blue chart ending on April 21. That decline  vanished after eight days of updates.

Five of the 138 "new" deaths occurred on unknown dates, so they are not reported in the blue chart. This is not unusual, dates are normally added in subsequent updates. But this is another example of how lags complicate efforts to identify Covid-19 trends.



This next graphic (above) compares both charts of "new" deaths reported each day. Ovals highlight April 14-21. The distribution of deaths did not change from the first to the second chart. This shows how daily totals can persistently misrepresent the distribution of deaths.




This graphic (above) compares both charts of the actual number of deaths each day. Ovals highlight April 14-21.

The distribution of deaths changed from the first to the second chart. The first chart incorrectly shows a decline in deaths. The second chart shows deaths were actually constant or increasing from April 14-21.

This illustrates how the accuracy of lagged data improves over time. As more deaths were reported the counts for the highlighted days were revised upward.

Lags are typically concentrated in the most recent days in any report. So the decline that now appears from April 22-29 might also disappear after new updates in coming days.
 

My last charts (above) show running totals, another common way to report trends associated with Covid-19. The orange chart is "new" deaths reported each day, and the blue chart is deaths occurring each day.

Circles highlight the most recent seven days. The running total of "new" deaths shows a rapid increase in deaths. This is not accurate.

The running total of deaths each day shows slower increases that are starting to level off. But this may change when daily death reports are updated with lagging data. So this curve might also be inaccurate.

Better ways to accurately report trends associated with Covid-19

A complete count of cases associated with Covid-19 probably will not be available for months or years. But the public, public health officials, and policy makers cannot wait that long. There is enormous demand for immediate information because we need to slow the virus now.

The best way to minimize the inaccuracies created by lagging data is by averaging over a long period of time. Most of this period should be days where counts have stabilized, and major revisions have ended.

For example, counts for the number of deaths in Ohio are typically revised for about 10 days after initial reporting, so trends should be for periods of at least 30 days. But counts for the number of  hospitalizations are typically revised for a much longer period so trends should account for this difference.

I use the percent change every three days to estimate Covid-19 trends. An example is (April 28 deaths/April 25 deaths). This measure, from economist Arnold Kling, is simple and intuitive -- a result larger than 1 means deaths are increasing, smaller than 1 means deaths are decreasing.

I then calculate the median change for the most recent 30 days to determine the trend.a This statistic was 1.06 on April 29, meaning the median three-day change in deaths was a 6 percent increase. Six days earlier the median three-day change was 1.17, or a 17 percent increase. So increases in Ohio deaths may have slowed.

Similar measures are the best way to report other trends associated with Covid-19. But I don't think its realistic to expect such statistics to become the norm.

However, reports should stop focusing on daily reports of "new" cases. Instead report the current total number of cases for a relevant period of time.

The best measure of a trend is cases on the actual days the cases occurred. This should be the preferred measure for reports whenever the data is available.

Regardless of the measure, the time period should be part of every report. This period should minimize the number of days still being revised because of lagging data. All reports should explain the limitations of the measure being used.

Graphics that show trends across time should only be used for day of data. These graphics should explain that counts for recent days may change because of lags reporting data.

a The median minimizes the influence of unusually large changes, or outliers.

Saturday, March 7, 2020

Coronavirus is a test for local journalism, will it pass?


Information and misinformation about the new Coronavirus has for weeks been easily available to anyone with an Internet connection -- i.e. almost everyone in the United States. So news organizations that don't cover this story until the virus is detected in their community are failing an important test.

People want information because they are justifiably concerned about the Coronavirus. Many people are getting sick, and some are dying. There is not yet a medicine or vaccine to treat the virus.  Mobile phones and computers make it easy to find, follow, and share reports about the virus on social media, search engines, and websites.


Local journalists compete directly with the information that people are finding on the internet. Journalists who aren't covering this story are losing this competition and signaling irrelevance to potential audiences. This is not a good strategy when local journalism is struggling to survive.

Searches coincide with developments in the news

I live in Athens, Ohio, a state that has not yet reported any infections. But Google's data on the volume of Coronavirus searches shows interest in Ohio coincides with major news about the virus.



The chart compares Ohio searches on the topic of Coronavirus with Ohio searches on the topic of the flu from December to March. Each topic includes many different search terms. Interest is measured on a scale from zero to 100, where 100 represents a peak in searches.1 

The chart begins Dec. 31, 2019, when China first reported the new virus to the world, according to a timeline in the New York Times. Searches for informaiton on the flu have not changed much in response to virus news, but the opposite is true for Coronavirus. 

The first Coronavirus case in the United States was reported on Jan. 21, 2020, the day the first spike in Ohio searches begins. The Trump administration announced restrictions on travel from China on Jan. 31, 2020, which was followed by the rapid decline in searches that ended the first spike.

The second spike in Coronavirus searches began Feb. 23, 2020, the day that authorities in Italy responded to a major outbreak by shutting down some Italian towns. The next day the Trump administration asked Congress for $1.25 billion to combat the virus in the United States. Searches in Ohio have been spiking ever since.

The trends in Ohio show journalists throughout the state should have been covering the story no later than Jan. 21.

I live in Athens, home to Ohio University which has extensive international connections and a medical school. So I've been surprised by the lack of coverage in two local newspapers that claim to serve the community. The first Coronavirus story that I read in either paper was just published in The Athens News three days ago, March 4, 2020.I might have missed some earlier stories, but that's because there were few or none.

Ohio is not uniquely interested in Coronavirus. Google data shows interest across the United States coinciding with the same major developments in the Coronavirus story.




Journalists can develop unique local stories

Coronavirus is a complicated story involving science, public health, politics, and local jobs and businesses. So many local journalists will probably have to learn a lot of new information at the same time they are covering the story.

Repeating information that is already available on the internet will not make local stories competitive. Local journalists must provide new and valuable information to attract and hold audience attention.


Fortunately, the internet also gives local journalists direct access to the global conversation among experts trying to contain the virus. This makes it possible to quickly find accurate information that can be used to develop differentiated local stories. 

For example, former FDA Commissioner Dr. Scott Gottlieb has warned that local health departments and hospitals might be rapidly overwhelmed if the virus becomes epidemic. His concerns are discussed on his Twitter feed (@ScottGottliebMD), which also references his op-eds in the Wall Street Journal and elsewhere.


Local journalists who publish stories on the limited resources available to fight an epidemic are likely to attract an audience that will stay with them for additional coverage. 


Journalists risk losing audiences if they don't cover this story

Local news organizations have limited staff. Many local newspapers are struggling financially. But journalists who don't re-order priorities to provide continuing coverage of the Coronavirus risk making those problems worse.

When someone is concerned or frightened they keep looking until they find information that answers their concern. Someone who cannot get local Coronavirus information from their community newspaper or television station will go elsewhere to find what they need. They might never return.


The Coronavirus is a major test of credibility for local journalists. But the virus is also an opportunity for journalists to show audiences why their work matters. I hope journalists pass this test.


1  According to Google, trends data is based on representative samples of all searches on a topic. The samples are used to create an index measuring the proportion of searches on a topic. Increases/decreases mean a larger/smaller proportion of searches in Ohio or the United States were about Coronavirus or the flu. This shows increases/decreases in interest about a topic. Charts do not show the actual number of searches.


Monday, April 25, 2016

How to evaluate Gannett's offer to buy Tribune Publishing

Gannett revealed today that it wants to buy the Tribune Publishing company for about $815 million. The offer is about 63% higher than the Tribune company's closing stock price last Friday, April 22.


Stock prices for Gannett (blue) and Tribune Publishing (orange) suggest advantage Gannett now
that its offer to buy the Tribune company is public. Prices April 2015-April 2016, from MSN Money.


Gannett says it's a cash deal, implying Gannett won't take on debt. That reduces Gannett's risk if the merger doesn’t generate substantial profits. Gannett and Tribune Publishing are old-line newspaper companies that have been transformed by digital competition. Gannett is probably well aware that other once-profitable newspaper companies failed after taking on enormous debt to finance mergers in the early years of this century.


Gannett, and some analysts, claim the merger will generate millions of dollars in “synergies,” which means reduced production costs. That is easy to say, but hard to do.

Focus instead on the value of the Tribune company assets. Are those assets undervalued at the Tribune’s current stock price? Is Gannett’s offer price still below the book value of the assets?

If undervalued assets are a factor that may explain why, according to Gannett, the Tribune has been reluctant to negotiate a sale. Changes in the Tribune's ownership and board of directors may be influencing the company's response to Gannett's offer. But we should also ask if Tribune executives have evidence that Gannett’s 63% premium is still less than the underlying value of their company.

Sixty-three percent surely sounds good to Tribune stockholders, which is why Gannett went public. But Gannett’s offer might not be the best available deal.

Focus also on the local markets where a merger might consolidate the ownership of local media that currently compete with each other. Consolidation would reduce the elasticity of audience demand. If local audiences have multiple media choices that are all owned by Gannett, that will make it easier for Gannett to sell advertising in those markets.

In a classic economic model, consolidation creates the possibility of increased power to raise prices for the owner that dominates the market. But Google, Facebook and other new media also sell local ads in local markets. One possibility is that Gannett only hopes to gain enough pricing power to become profitable in these markets.

In any case, a post-merger Gannett will have to manage audience demand. Audiences increasingly consume news only on social media sites like Facebook.  The fraction of the audience that leaves social media to visit news company sites doesn’t stay long or visit often.

It’s true, and often overlooked, that about half of newspaper readers still only read the print edition. But the trends are clear, more and more people are consuming news online or on social media.

Gannett will face the tricky problem of (a) trying to stop the migration of audiences away from its traditional or digital media platforms while (b) trying to persuade social media audiences to engage with those platforms.

So, the financials of the proposed deal may favor Gannett. But managing the merger to produce anticipated cost reductions, pricing power, or profits is likely to be challenging.

(A version of this post first appeared on my Twitter account @HughJMartinPhd)










Wednesday, May 7, 2014

Glimpse of Tumblr is reminder new media business model isn't generating many jobs

There is still a lot we don’t know about the new media business model, so even a glimpse of the model’s inner workings can be valuable. A New York Times article about Tumblr offers such a glimpse, which shows the company is unlikely to generate a significant number of media jobs.
 
There is a great deal of fascination about companies like Tumblr, a popular blogging platform that Yahoo purchased last year for a reported $1.1 billion, mostly cash. Tumblr wasn’t profitable, but Yahoo did acquire millions of Tumblr bloggers to add to Yahoo’s user base. Yahoo is developing ways to distribute advertising aimed at Tumblr users.
 
Tumblr, like other new media companies, has some superficial similarities to traditional media companies. Both new and traditional media publish content that attracts an audience, then sell advertisers access to that audience. But the similarities end there.
 
New media companies like Tumblr don’t pay for the content - blog posts (including pornography) – they need to exist. Traditional media companies do pay for content, which increases their production costs.
 
New media companies like Tumblr also rely on automation -- computers and computer software --to provide a platform for the production and distribution of the content they use. Traditional media companies cannot easily develop similar platforms because millions of potential users have already selected new media platforms for blogging and other Internet activities.
 
The new media business model relies on free content and automation to keep costs low, otherwise these companies would go out of business. That is because new media companies generate very small per-unit revenue from Internet advertising. These companies must keep their per-unit costs low if they want to generate enough money to survive.
 
The Times article reports that Tumblr doubled its staff, but still employs only 220 people. As of today, Tumblr claims it has 185 million blogs. That is about 841,000 blogs for each employee. If Tumblr expands to 500 employees, it will have 370,000 blogs for each employee. Even if activity on the blogs varies, these numbers show the kind of astonishing productivity that new media companies enjoy because of their reliance on automation.
 
The low per-unit revenue at new media companies means they must also attract a very large number of users before they can generate enough profit to justify the high values that new media companies receive from financial markets. Traditional media companies have much lower values in financial markets, but traditional media still generate high enough revenue-per-unit to survive without an audience in the hundreds of millions.
 
For example, Tumblr’s enormous number of blogs means it has to generate average revenue-per-blog of just $5.95 to match its $1.1 billion purchase price.
 
However, Tumblr still isn’t generating enough revenue to develop “a working business model” according to The Times. (Yahoo hasn’t broken out figures for Tumblr in Yahoo’s most recent financial reports).
 
Yahoo is still trying to develop advertising that won’t disturb the Tumblr ethos, which rejects advertising. I suspect Yahoo is also developing ways to generate revenue from data about Tumblr users even though Yahoo only requires an e-mail address to identify each Tumblr user.
 
This suggests one more thing the glimpse tell us about the new media business model. Small per-unit revenue means these companies require enormous numbers of users to generate enough revenue to become profitable. But sometimes, even a large number of users and a very small number of employees won’t be enough for a new media company to become profitable.

Saturday, April 19, 2014

Newspaper industry revenue declines have slowed, but revenue hasn’t stabilized



Click to open full size
Some analysts argue declines in newspaper industry revenue are finally stabilizing. The Newspaper Association of American has released the 2013 revenue figures. I’ve prepared two pictures to see if it revenues have stabilized.

As readers of this blog know, the business model for Internet-based news organizations is unlikely to replace the thousands of journalism jobs that are vanishing at traditional media organizations. Newspapers have always been the major source of jobs for journalists who produce local news across the U.S.. Stabilizing newspaper revenue is critical for preserving some of the jobs that are left.

The chart above shows inflation-adjusted advertising and circulation revenues since 1991. Print advertising revenues peaked in 2000 at about $38 billion, followed by a brief decline.

In 2003 the industry added digital advertising revenue generated by newspaper websites to its advertising figures. Revenue increased slightly that year, and stabilized until 2006.

In 2007 the last recession began, and advertising revenue began a steep decline that devastated the newspaper industry. The recession exacerbated an underlying trend caused by the shift of advertising to websites and search engines. In 2010 the decline slowed, but digital advertising only accounted for 13% of the industry’s total ad revenue that year.

The steep decline in print ad revenue and the small gains from digital ad revenue forced the industry to reconsider its revenue strategies. In 2011, the industry added to its advertising base revenue from niche publications, direct marketing, and non-daily publications. Total ad revenue increased slightly that year, but declined again in 2012 and 2013.
 
The industry generated about $13.7 billion in inflation-adjusted ad revenue last year, or less than half of its peak revenue in 2000.
 
The chart above shows inflation-adjusted circulation revenue has been less volatile. In 1991 newspapers generated $8.6 billion from print subscriptions and single-copy sales. In 2013 newspapers generated an inflation-adjusted $6.3 billion from print and digital subscriptions and single-copy sales.

I’ve written before about the importance of charging subscriptions for access to newspaper websites and mobile applications. Digital subscriptions can slow the loss of print subscribers who will otherwise switch to free access on the Internet. Preserving print circulation is critical because print advertising still accounts for the largest portion of industry revenue.

Some newspapers now offer discounts to encourage subscribers to select bundled digital and print subscriptions. These bundled subscriptions are designed to slow or stabilize declines in print circulation.

Digital subscriptions can also generate new revenue to offset some of the losses from print advertising revenue. And the first chart does show that subscription revenue has stabilized, which may be partly due to double-digit growth in digital subscriptions.1

The first chart also provides perspective on the second chart, which shows the ratio of circulation revenue to advertising revenue since 1991.



Click to open full size
This ratio declined throughout the 1990s when newspaper advertising revenue enjoyed its last period of sustained growth. In 2007, the ratio began a pronounced increase that continued until 2013. If the trend continues, newspapers will generate $1 from circulation for every $2 from advertising in the next year or two.

A naïve reading of this chart would suggest that circulation is on track to replace the advertising revenue that newspapers are losing. But the first chart shows the increase in the circulation/advertising ratio is mostly the result of steep declines in ad revenue. This increases the industry's reliance on circulation, but circulation revenue is not yet increasing enough to replace the ad revenue that is being lost.

So I’m not ready to agree that newspapers revenues have stabilized.

Advertising is still the industry’s primary source of revenue. Ad revenue declines have slowed, but they have not ended. Circulation revenue appears stable, but it still cannot replace the ad revenues that continue to disappear.

Even if inflation-adjusted revenue was steady from year to year, the industry would still be falling behind. An industry has to grow faster than inflation to be considered truly healthy.

I do expect that if enough newspapers adopt economically sensible digital subscriptions, those subscriptions will help stabilize industry revenues. But it’s going to take a couple more years of data before we can tell if that is happening.
1 The amount of revenue from digital subscriptions has not been released. The report only includes percentage changes in revenue. Without the base numbers, it's impossible to know how much real growth has occurred.

Wednesday, April 16, 2014

Twitter's purchase of data firm an example of a new kind of media competition

I’ve been expecting Twitter to assert more control over the enormous volume of data generated by its users because the data can be sold to businesses, government agencies and academics. Yesterday, Twitter took a notable step to assert that control when it announced the purchase of Gnip, a company that packages and sells data generated from more than 500 million Tweets that are posted each day.

Twitter was losing money when it began selling stock last November, so Twitter has to show investors that it’s doing everything possible to make a profit. Twitter vice-president Jana Messerschmidt explained why Gnip will help Twitter generate new revenue:

“Every day Twitter users share and discuss their interests and what’s happening in the world. These public Tweets can reveal a wide variety of insights — so much so that academic institutions, journalists, marketers, brands, politicians and developers regularly use aggregated Twitter data to spot trends, analyze sentiment, find breaking news, connect with customers and much more.”
 
Twitter also generates revenue by selling advertising. But Twitter knows that we are at the beginning of a data analysis revolution that already generates valuable insights for “hundreds of clients” in business, government, the media, and academe. Those clients will now be paying Twitter for access to Tweets and the analytical tools developed by Gnip.
 
Twitter is under particular pressure from investors to generate revenue because the initial price of its stock rose to levels that some analysts considered too high for a company that was losing money.

But I suspect investors paid those high prices because they believe Twitter is likely to be one of three companies –Google and Facebook are the other two - that will dominate Western Internet markets for information and advertising. When a small number of companies dominate a market, those companies may have pricing power – they can increase prices far above production costs to generate high profits.[1]

One way for Twitter to reassure investors is to take steps that rapidly increase revenue, which is why I expected the company to assert more control over the valuable data on its network. Investors apparently viewed yesterday’s announcement the same way – Twitter’s stock price increased over 11 percent, which Reuters called the largest stock price increase since the company went public.

Gnip is not the only company that purchased access to Tweets and then re-sold them to its clients. Twitter’s statement did not specify how its relationship with other Twitter data providers might change.

However, it’s unlikely that independent Twitter data providers will be allowed to compete away a significant share of the revenue that Twitter gains from purchasing Gnip. At the same time, it will be interesting to see if Gnip continues to offer data from Tumblr, which is owned by Yahoo.

Media companies have always competed for audiences and advertising. Twitter’s purchase of Gnip is an example of a new kind of media competition. The new prize is an endless stream of raw information detailing the characteristics, behavior and preferences of millions of people who use the company’s products.

[1] An oligopoly exists if a company can set a price above its costs, but must then account for the reaction of rival companies in the same market. This makes it possible for companies to earn economic rents, which are profits above the level of returns that could be earned from comparable investments elsewhere. However, economic rents are not assured. A rival firm might offer advertising and information at lower prices, triggering a price competition that ends when prices are just high enough to cover production costs.

Monday, April 14, 2014

Pulitzer Prizes an occasion to consider relationship of newspaper quality to economic success


The Pulitzer Prizes announced today offer a chance to consider three studies of the links between journalistic quality and economic success. The Pulitzers, newspaper journalism’s most prestigious awards for excellence, go to a small number of newspapers and websites each year.
 
Brian Logan and Daniel Sutter found newspapers that had won Pulitzer Prizes also had “significantly higher circulation [than newspapers without Pulitzers], even when controlling for the economic and demographic characteristics and media competition of the metropolitan area” where the newspaper circulated.
 
Publishers will only spend money to produce prize-winning journalism if the journalism pays for itself by increasing circulation and generating extra revenue. Increases in circulation at Pulitzer winning papers were probably large enough to generate that extra revenue, the study concluded.
 
Logan and Sutter argued that Pulitzer Prizes are an important “signal of quality” for consumers. News is what economists call a credence good. Consumers cannot evaluate the true quality of a credence good even after they have consumed the good. For example, consumers have no way to tell if the information in a news article is accurate. This forces consumers to evaluate quality based on a newspaper’s reputation, including the prizes it has won.

This was a careful study, but it used circulation data from 1997. We need newer studies that account for the shift of audiences to the Internet before we can be sure the Pulitzers are still associated with significantly larger audiences. The second and third studies are from a line of research that examines the overall quality of news instead of focusing on Pulitzer Prizes.
 
These studies  use the financial commitment model. This model states newspapers facing competition will increase their newsroom spending, or financial commitment to news. Increased spending results in a larger newspaper staff and/or an increase in the variety and depth of news that is published. As the quality of news increases, consumers receive more utility from reading the newspaper. This in turn leads to increases in the newspaper’s circulation and/or advertising revenues.
 
In the second study Stephen Lacy and I reviewed decades of research that supports the financial commitment model. We focused on newspaper reactions to declines in circulation as readers and advertisers shifted to news published on the Internet.
 
Papers might use any of three strategies to maintain profits when circulation declines. First, newspapers might try to offset circulation declines by increasing advertising prices. Second, newspapers might leave ad prices unchanged, which amounts to an increase since advertisers are paying to reach fewer readers. Third, newspapers might cut their newsroom costs and reduce the quality of their news.

We concluded newspapers that raised ad prices or reduced quality would probably accelerate the loss of circulation. However, newspapers that published quality content might stabilize or slow declines in circulation.


The third study had unusual access to 12 years of  internal revenue and circulation data from an individual newspaper. The study looked at newsroom spending, subscription revenue, and advertising revenue from the print and online editions of the newspaper.
 
The study used subscription revenue instead of circulation because advertisers value subscribers more than they value readers who don’t pay for the paper. Subscribers are more likely to read the paper carefully and register advertising messages, argued authors Yihui Tang, Shrihari Sridhar, Esther Thorson and Murali K. Mantrala.
 
Results showed increased newsroom spending resulted in increased subscriptions to the newspaper.  The subscription increases then resulted in increased print and online advertising revenues. A simulation showed opposite effects – reductions in newsroom spending could lead to reductions in subscriptions, resulting in reductions in both online and print advertising revenues.
 
These last two studies accounted for the Internet. However, the three studies are just a beginning.
 
Many newspapers still rely on print editions to generate the bulk of their advertising and subscription revenues. Online revenue is a distant second when it comes to generating profits. More empirical research is needed to produce additional recommendations that can help newspaper managers who are trying to survive in this difficult environment.