I recently began a new full-time position as a technical writer at Commonwealth Computer Research, Inc., more commonly known as CCRi. CCRi was doing large-scale data science long before the term “data science” became so popular; one company founder also directs the University of Virginia’s Data Science Institute. They also do a lot of work with distributed machine learning and other cutting edge technologies, especially in the area of geospatial analytics. The chance to work with so many different interesting new technologies and smart people—engineering and math PhD’s tend to be the norm instead of the exception—right here in Charlottesville, after telecommuting for over eight years, was just too good to pass up.
Having recently grown to over 80 employees, CCRi has gotten large enough that it’s become difficult for everyone there to know about all the technology and projects going on in other parts of the company. Part of my role will be to help with that, documenting these things so that it’s easier for people to find connections between the different existing and new efforts underway. I’ll also be helping them with marketing and business development.
RDF and SPARQL do play a role in some of the projects there, mostly using the Rya triplestore because of its use of Apache Accumulo for storage. Accumulo is a key-value pair NoSQL database built on Hadoop whose design is based on Google’s BigTable database, and it plays an important part in several CCRi projects.
One of the biggest projects at CCRi is GeoMesa, which is described by its product page is “an open-source solution maintained and supported by CCRi for storing, indexing, querying, transforming, and visualizing spatio-temporal data at scale in Accumulo.” For a start, it adds to Accumulo what PostGIS adds to PostgreSQL: datatypes, functions, and more features that make it easy to store and query geospatial data. Going beyond that, GeoMesa lets you store spatio-temporal data, so that event timestamps can play a role in applications that use GeoMesa. Apache Kafka provides GeoMesa with some nice infrastructure for handling real time streaming data. For example, it was used to create this animated U.S. map of tweets over the 2015 Super Bowl week.
As alternatives to using Accumulo for storage, GeoMesa can also use Apache HBase and Google Cloud BigTable, the public version of Google’s internal Bigtable storage system. After Google heard about this, they contacted CCRi about a partnership, which was exciting enough in this town for a local TV station to run the news story shown below. That video is fun, but if you only have a minute and a half to watch a video about GeoMesa, I recommend the GeoMesa on Google BigTable one, which shows off some of the excellent visualizations that are possible.
In addition to products like GeoMesa and others that you can see on the website, the company does applied research, often for government agencies. (I’m learning a lot about those—did you know that the U.S. has an Office for Anticipating Surprise?) In this era of Big Data, the question sometimes comes up of how to best make use of all this data now that tools for working with such large quantities of it have become more easily available. CCRi’s capabilities such as predictive analytics, optimization, and text analysis are helping customers get more out of this data in settings ranging from international sales patterns to battlefields. If anyone wants to contact me to learn more, I’d be happy to set them up with the right people to tell them about the kinds of services CCRi offers.