The fast-evolving areas of artificial intelligence (AI) hold great promise in road transportation as sensing of vehicle motion becomes incorporated in new vehicles and within roadway infrastructure. While there are a lot of big data out there that can be used to train the algorithms of this machine intelligence, some of it is unclean and biased. While sensing within vehicles and roadway infrastructure may be less susceptible to tampering, broad data-sets exist that have been developed from biased human judgment. Their existent is demonstrating the lack of past foresight in ensuring that this big data is valid and reliable for AI purposes.
A simple example of the AI developments that are taking shape is discussed in a research paper from 2016 entitled “Road crack detection using deep convolutional neural network”, by Lei Zhang et. al. and published in the 2106 IEEE International Conference. Basically images of a road surface were used to detect cracks in that surface for roadway maintenance purposes.
Data collection methods by Gorski Consulting of roadway surface characteristics have been discussed on this Gorski Consulting site since 2014 and a “Road Data” database exists on this website showing the results from the sensed motion of our test vehicle as it has travelled over many roads in Southwestern Ontario. Furthermore the data can also be matched to the videos from multiple video cameras that were attached to the test vehicle as the motion data was collected. Such a data-set is unique, clean and unbiased.
Examples of biased data comes from the vast numbers of police records of collision data. Such data is incomplete as little is being done to understand and recognize the large numbers of unreported incidents that are assumed to be of lower severity and therefore of minimal importance. Documentations by Gorski Consulting for the last nine years at a specific site (Clarke Road north of Fanshawe Park Road in London, Ontario) have involved photographic evidence of collisions and loss-of-control incidents that present a much clearer picture of the large numbers of collisions and incidents that are not reported in police data. At the specific site which includes a complex S-curve, between 80 and 90 percent incidents have been unreported in police data. While many of these incidents are minor, some are not and there is no information available in official records about how many significant incidents remain unreported. The involvement of placing blame for collisions provides further erosion of the police data as the motivation for documentation is not always focused on pure scientific purpose.
These issues demonstrate that the usefulness of the rapidly developing areas of AI are dependent on reliable and valid data that is not always available. Bad data is particularly damaging as it is difficult and time-consuming to detect the good from the bad. It now becomes more clear that it is of the utmost importance to examine how such data can be eradicated and collection methods improved.