Blog Profile / The Endeavour


URL :http://www.johndcook.com/blog/
Filed Under:Academics
Posts on Regator:1524
Posts / Week:4.6
Archived Since:April 26, 2011

Blog Post Archive

Adding Laplace or Gaussian noise to database for privacy

In the previous two posts we looked at a randomization scheme for protecting the privacy of a binary response. This post will look briefly at adding noise to continuous or unbounded data. I like to keep the posts here fairly short, but this topic is fairly technical. To keep it short I’ll omit some of […]

Quantifying privacy loss in a statistical database

In the previous post we looked at a simple randomization procedure to obscure individual responses to yes/no questions in a way that retains the statistical usefulness of the data. In this post we’ll generalize that procedure, quantify the privacy loss, and discuss the utility/privacy trade-off. More general randomized response Suppose we have a binary response […]

Randomized response, privacy, and Bayes theorem

Suppose you want to gather data on an incriminating question. For example, maybe a statistics professor would like to know how many students cheated on a test. Being a statistician, the professor has a clever way to find out what he wants to know while giving each student deniability. Randomized response Each student is asked […]

Why don’t you simply use XeTeX?

From an FAQ post I wrote a few years ago: This may seem like an odd question, but it’s actually one I get very often. On my TeXtip twitter account, I include tips on how to create non-English characters such as using \AA to produce Å. Every time someone will ask “Why not use XeTeX and just […]

Pascal’s triangle and Fermat’s little theorem

I was listening to My Favorite Theorem when Jordan Ellenberg said something in passing about proving Fermat’s little theorem from Pascal’s triangle. I wasn’t familiar with that, and fortunately Evelyn Lamb wasn’t either and so she asked him to explain. Fermat’s little theorem says that for any prime p, then for any integer a, ap = a […]

Making a problem easier by making it harder

In the oral exam for my PhD, my advisor asked me a question about a differential equation. I don’t recall the question, but I remember the interaction that followed. I was stuck, and my advisor countered by saying “Let me ask you a harder question.” I was still stuck, and so he said “Let me […]

Quantifying the information content of personal data

It can be surprisingly easy to identify someone from data that’s not directly identifiable. One commonly cited result is that the combination of birth date, zip code, and sex is enough to identify most people. This post will look at how to quantify the amount of information contained in such data. If the answer to […]

Negative correlation introduced by success

Suppose you measure people on two independent attributes, X and Y, and take those for whom X+Y is above some threshold. Then even though X and Y are uncorrelated in the full population, they will be negatively correlated in your sample. This article gives the following example. Suppose beauty and acting ability were uncorrelated. Knowing how […]

Highly cited theorems

Some theorems are cited far more often than others. These are not the most striking theorems, not the most advanced or most elegant, but ones that are extraordinarily useful. I first noticed this when taking complex analysis where the Cauchy integral formula comes up over and over. When I first saw the formula I thought […]

Width of mixture PDFs

Suppose you draw samples from two populations, one of which has a wider probability distribution than the other. How does the width of the distribution of the combined sample vary as you change the proportions of the two populations? The extremes are easy. If you pick only from one population, then the resulting distribution will […]

Team dynamics and encouragement

When you add people to a project, the total productivity of the team as a whole may go up, but the productivity per person usually goes down. Someone suggested that as a rule of thumb, a company needs to triple its number of employees to double its productivity. Fred Brooks summarized this saying “Many hands […]

Relearning from a new perspective

I had a conversation with someone today who said he’s relearning logic from a categorical perspective. What struck me about this was not the specifics but the pattern: Relearning _______ from a _______ perspective. Not relearning something forgotten, but going back over something you already know well, but from a different starting point, a different […]

Hurricane Harvey update

As you may know, I live in the darkest region of the rainfall map below. My family and I are doing fine. Our house has not flooded, and at this point it looks like it will not flood. We’ve only lost electricity for a second or two. Of course not everyone in Houston is doing […]

Defining the Fourier transform on LCA groups

My previous post said that all the familiar variations on Fourier transforms—Fourier series analysis and synthesis, Fourier transforms on the real line, discrete Fourier transforms, etc.—can be unified into a single theory. They’re all instances of a Fourier transform on a locally compact Abelian (LCA) group. The difference between them is the underlying group. Given […]

Unified theory of Fourier transforms

You can take a periodic function and analyze it into its Fourier coefficients, or use the Fourier coefficients in a sum to synthesize a periodic function. You can take the Fourier transform of a function defined on the whole real line and get another such function. And you can compute the discrete Fourier transform via […]

Solving problems we wish we had

There’s a great line from Heather McGaw toward the end of the latest episode of 99 Percent Invisible: Sometimes … we can start to solve problems that we wish were problems because they’re easy to solve. Reminds me of an excerpt from Richard Weaver’s book Ideas Have Consequences: Obsession, according to the canons of psychology, […]

Predicting a LCG output

A few days ago I wrote about how to pick the seed of a simple random number generator so that a desired output came n values later. The number n was fixed and we varied the seed. In this post, the seed will be fixed and we’ll solve for n. In other words, we ask when a […]

Programming language life expectancy

The Lindy effect says that what’s been around the longest is likely to remain around the longest. It applies to creative artifacts, not living things. A puppy is likely to live longer than an elderly dog, but a book that has been in press for a century is likely to be in press for another century. […]

Reverse engineering the seed of a linear congruential generator

The previous post gave an example of manipulating the seed of a random number generator to produce a desired result. This post will do something similar for a different generator. A couple times I’ve used the following LCG (linear congruential random number generator) in examples. An LCG starts with an initial value of z and updates z […]

Manipulating a random number generator

With some random number generators, it’s possible to select the seed carefully to manipulate the output. Sometimes this is easy to do. Sometimes it’s hard but doable. Sometimes it’s theoretically possible but practically impossible. In my recent correspondence with Melissa O’Neill, she gave me an example that seeds a random number generator so that the […]

Copyright © 2015 Regator, LLC