Suppose you have data from a discrete power law with exponent ?. That is, the probability of an outcome n is proportional to n-?. How can you recover ?? A naive approach would be to gloss over the fact that you have discrete data and use the MLE (maximum likelihood estimator) for continuous data. That […]

Here are wordclouds for some of my most popular Twitter accounts. Thanks to Mike Croucher for creating these images. He explains on his blog how to create your own Twitter wordclouds using R. My most popular account is CompSciFact, tweets about computer science and related topics. AlgebraFact is for algebra, number theory, and miscellaneous pure […]

David Mumford wrote a blog post a few weeks ago in which he identified four tribes of mathematicians. Here’s a summary of his description of the four tribes. Explorers are people who ask — are there objects with such and such properties and if so, how many? … Alchemists … are those whose greatest excitement comes from […]

Today I needed to the derivative of the zeta function. SciPy implements the zeta function, but not its derivative, so I needed to write my own version. The most obvious way to approximate a derivative would be to simply stick a small step size into the definition of derivative: f’(x) ? (f(x+h) – f(x)) / […]

“Ever since Euclid, mathematical proofs have served a dual purpose: certifying that a statement is true and explaining why it is true. In the future these two epistemological functions may be divorced. In the future, the computer assistant may take care of the certification and leave the mathematician to look for an explanation that humans […]

Anthony Scopatz had did an interview for Podcast.__init__ recently talking about xonsh, a command shell that blends Python and some traditions from bash. One line from the interview jumped out at me: … thinking very critically about what shells get used for and what they’re actually good at and what they’re not good at. I’ve […]

Hilary Mason made an important observation on Twitter a few days ago: You do not want to be an edge case in this future we are building. Systems run by algorithms can be more efficient on average, but make life harder on the edge cases, people who are exceptions to the system developers’ expectations. Algorithms, whether encoded in software or […]

Large companies take longer to start projects. How much longer? A plausible guess is that project lead time would be proportional to the logarithm of the company size. If a company with n employees has a hierarchy with every manager having m subordinates, the number of management layers would be around logm(n). If every project has […]

Four years ago I wrote about the wildfires in Bastrop, Texas. Here’s a photo from the time by Kerri West, used by permission. Today I visited Bastrop State Park on the way home from Austin. Some trees, particularly oaks, survived the fires. Pines have come back on their own in parts of the park. A volunteer working […]

A medical device company approached me with the following problem. Scientists had written academic journal articles about their product, but the sales force couldn’t understand what they said. My task was to read the articles, then tell the people in sales what the articles were saying in laymen’s terms. One of the questions that came […]

The article Deming, data and observational studies by S. Stanley Young and Alan Karr opens with Any claim coming from an observational study is most likely to be wrong. They back up this assertion with data about observational studies later contradicted by prospective studies. Show More Summary

It’s hard to transfer intellectual property. When I was managing software projects, it would take months to fully transfer a project from one person to another. This was with full access to and encouragement from the original developer. This was a transfer between peers, both part of the same environment with all its institutional memory. […]

The Insight 2015 conference highlighted some impressive applications of big data: predicting the path of hurricanes more accurately (as we saw with hurricane Patricia), improving the performance of athletes, making cars safer, etc. These applications involve large amounts of data. Show More Summary

A/B testing, or split testing, is commonly used in web marketing to decide which of two design options performs better. If you have so many visitors to a site that the number of visitors used in a test is negligible, conventional randomization schemes are the way to go. They’re simple and effective. But if you […]

A few weeks ago I got a message on Twitter saying that IBM’s Watson had identified me as an “influencer” and invited me to the company’s Insight 2015 conference. So that’s where I am this week. I had a brief interview last night. Someone took this photo as we were setting up.

The distance between the Earth and Mars depends on their relative positions in their orbits and varies quite a bit over time. This post will show how to compute the approximate distance over time. We’re primarily interested in Earth and Mars, though this shows how to calculate the distance between any two planets. The planets […]

You may expect that a burst of input will cause a burst of output. Sometimes that’s the case, but often a burst of input results in a long, smoothly decreasing succession of output. You may not get immediate results, but long-term results. This is true of life in general, but it’s also true in a precise sense of differential equations. […]

Suppose a test asks you to place 10 events in chronological order. Label these events A through J so that chronological order is also alphabetical order. If a student answers BACDEFGHIJ, then did they make two mistakes or just one? Two events are in the wrong position, but they made one transposition error. The simplest way […]

Electric lighting has changed the way we sleep, encouraging us to lose sleep by staying awake much longer after dark than we otherwise would. Or maybe not. A new study of three contemporary hunter-gatherer tribes found that they stay awake long after dark and sleep an average of 6.5 hours a night. They also don’t nap […]

Here’s an unusual formula for pi based on the product and least common multiple of the first m Fibonacci numbers. Unlike the formula I wrote about a few days ago relating Fibonacci numbers and pi, this one is not as simple to prove. The numerator inside the root is easy enough to estimate asymptotically, […]

