For an arbitrary set of data points, what’s the ‘best’ graph axis for those points? Say that your x-variable runs from 5 to 45. A human can quickly pick a few promising options for the x-axis. It could have ticks at 0, 15, 30, and 45. Or perhaps 0, 10, 20, 30, 40, and 50. Getting a program to generate ‘nice’ options like this is a bit trickier. I’ve been working on a graphing app recently and I’ve reproduced my solution, with notes, below.

Although there are a number of algorithms people have put forth on Stack Overflow and elsewhere, many of these do not handle certain kinds of data sets correctly, and virtually none treat 0 correctly in my opinion. I started from scratch with the following 5 rules for my function:

- If an axis crosses 0, 0 *must* be an axis tick value
- Axis ticks *must* be attached to a grid line (i.e. you can’t have x-axis ticks floating in space – they must be attached to a y-axis grid line as you see in the example above)
- The data should be as tightly contained as possible (little wasted space)
- There should be no fewer than 4 ticks on each axis, and if rule 2 requires it, up to 10
- Numbers should be ‘nice’ (round numbers etc.)

My Javascript algorithm is below with comments. The basic approach is to generate a large number of candidate axes, score each axis based on the amount of wasted space and the number of ticks used, and then compare these scores. It is a bit brute force in this way. Computation is cheap! It takes an array of data values as input – for example [0, 12, 8, 20, 5] or whatever.^{1}

First, get the minimum and maximum of the series, toggle the zero_flag variable if 0 is between the min and max, and get the range of the data.

function create_axis(dataseries) { maxdata=Math.max(...dataseries) mindata=Math.min(...dataseries) if(mindata<=0 && maxdata>=0){ zero_flag=1 } range=maxdata-mindata

Next, define ‘nice’ numbers. You could change this if you’d like to include other possibilities, but I decided I would allow counting by 1s, 2s, 5s, 10s, 15s, 25s, and 75s. This will make a bit more sense below.

nice_ticks=[.1,.2,.5,1,.15,.25,.75]

This next part is a bit of path dependence – I had an algorithm where the number of ticks was more central and I kept that framework but I probably wouldn’t do it this way again. I get a naive value for the distance between ticks and determine the place value of this distance.

steps=range/(ticks-1) if(steps>=1){ rounded=Math.round(steps) digits=rounded.toString().length } else { places=steps.toString().split('.')[1] first_place=0 for(var i=0;i<places.length;i++){ if (places[i]!='0' && first_place==0){ first_place=i } } digits=-parseInt(first_place) }

Now using the value of digits (the place value of steps), generate a list of candidate steps. These are just the values from nice_steps above multiplied by powers of 10 according to the place value of digits. Because computation doesn’t matter to me, I check 10^place value+1, 10^place value, and 10^place value-1. Most of these candidate step lengths will be terrible but it doesn’t matter – they will get weeded out in the next step. If our initial step length was 13, candidate steps will be generated by taking 1, 10 and 100 * all values of nice_ticks. So 13 would result in candidate steps: [.1,.2,.5,1,.15,.25,.75,1,2,5,10,2.5,2.5,7.5,20,50,100,15,25,75]

candidate_steps=[] for (var i=0;i<nice_ticks.length;i++){ candidate_steps.push(nice_ticks[i]*Math.pow(10,digits)) candidate_steps.push(nice_ticks[i]*Math.pow(10,digits-1)) candidate_steps.push(nice_ticks[i]*Math.pow(10,digits+1)) }

Loop through candidate steps and generate an axis based on each step length.

for (var i=0;i<candidate_steps.length;i++){ steps=parseFloat(candidate_steps[i]) // starting value depends on whether or not 0 is in the array if(zero_flag==1){ min_steps=Math.ceil(Math.abs(mindata)/steps) step_array=[-min_steps*steps] } else { step_array=[Math.floor(mindata/steps)*steps] } var stepnum=1 while (step_array[step_array.length-1]<maxdata){ step_array.push((parseFloat(step_array[0])+steps*stepnum)) stepnum++ } // this arbitrarily enforces step_arrays of length between 4 and 10 if (step_array.length<11 && step_array.length>4){candidate_arrays.push(step_array)} } } }

All that remains is to score all the candidate arrays. I’m not going to include my scorer, because there are a lot of arbitrary choices involved, but basically I look at how much space each array wastes compared to the data use that as a starting value. Each array gets the score 10^percent wasted space – then I further penalize the array for large values of ticks, tick values that I don’t like as much (.15 for example, is great in certain cases, but probably shouldn’t be liked as much by the function as .1). The array with the lowest score ‘wins’.

- I have excluded parts of the function – to handle dates and certain other edge cases. ↑