By using website you agree to our use of cookies as described in our cookie policy. Learn More

TDWI Upside - Where Data Means Business

Preparing Your Company for Machine Learning

To take advantage of machine learning, you'll need a new set of skills. Here are a few recommendations.

My last article on artificial intelligence (AI) and machine learning (ML) concluded that "the split of the necessary AI/ML between the 'edge' of corporate users and the software itself is still to be determined." Several readers have reached out to ask about the tools and skills needed to accomplish the edge computing for their companies.

For Further Reading:

AI and Analytics: The Rise of the Machines

Physician, Heal Thyself: Machine Learning and the Ingestion of Data

Google's Machine-Learning Vision May Transform the Enterprise

For many companies to take advantage of machine learning, they will require new skill sets. I'll review some here that will work.

Wherever we go in terms of programming, it is unmistakable that math is making a comeback. I'm talking floating point arithmetic, deep statistics, and linear algebra. ML is the application of math to tasks. Without great competency in math, ML will remain elusive to the company or the individual.

Programming Languages for Machine Learning

For all of these languages, ML algorithms perform best on a GPU instead of a CPU.

C and C++ are programming staples at this point. Though difficult to use, creating many lines of code, and leaving the memory management to you, performance is undeniable when the code is well written. They are the closest realistic languages to bare metal programming, and you can implement ML with them.

However, there are better languages for ML than C and C++. Higher-level languages such as Python with TensorFlow convert to C and C++ so you can skip some of the complexity and still get the performance in most cases.

Python is the most popular language for ML. Although not the fastest, it is easy to program and it is a good enough solution that covers all the bases. We have been using Python for years. Emerging libraries such as NumPy and pandas simulate MATLAB and R libraries. In short, Python is the full data science workbench.

TensorFlow (Google open source) adds a computational/symbolic graph to Python so you can, for example, write your neural network in TensorFlow and it will be rewritten in C.

R and MATLAB are different from procedure/object languages such as Python. These are optimized for math with features such as direct slice and dice of matrices and rich libraries to draw from.

In this way, R supports ML robustly with fewer lines of code than C or C++. R is open source, but MATLAB is a commercially licensed product used mostly in academia and research. One of our research clients uses MATLAB.

Let's not forget about the Java Virtual Machine languages of Java and Scala. These work well with Hadoop and Spark respectively acting as the data pipelines. Java is an older language that is still prevalent but limited in ML. Scala is easier to use and features higher-performing Spark and SparkML, the machine learning API from Spark.

The Takeaway

Corporations wishing to lead in the emerging space of machine learning should make plans now to establish their technology framework and nurture the necessary skills, including advanced math. A framework I recommend considering for corporate machine learning is Python with TensorFlow and Scala on Spark.

About the Author

McKnight Consulting Group is led by William McKnight. He serves as strategist, lead enterprise information architect, and program manager for sites worldwide utilizing the disciplines of data warehousing, master data management, business intelligence, and big data. Many of his clients have gone public with their success stories. McKnight has published hundreds of articles and white papers and given hundreds of international keynotes and public seminars. His teams’ implementations from both IT and consultant positions have won awards for best practices. William is a former IT VP of a Fortune 50 company and a former engineer of DB2 at IBM, and holds an MBA. He is author of the book Information Management: Strategies for Gaining a Competitive Advantage with Data.

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, and Team memberships available.