TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

TDWI Articles

00 Days

00 Hrs

00 Min

00 Sec

5 Minutes with a Data Scientist: Shujia Zhang of SafetyCulture

In search of more practical insights from a working data scientist, Upside spoke recently with Shujia Zhang from SafetyCulture, a start-up that is using analytics to enable safe workplaces.

By James E. Powell
October 20, 2016

Third-party analytics services are creating innovative solutions across many industries. One such service is SafetyCulture, an Australia-based start-up committed to enabling safe and efficient workplaces. SafetyCulture's app, iAuditor, allows users to conduct inspections and audits in the field, and the company then provides analytics to reveal insights and trends.

Shujia Zhang is a data scientist at SafetyCulture, and she spoke with Upside recently about her work.

UPSIDE: What's the one thing you wish people knew about your job as a data scientist?

Shujia Zhang: Data scientist is a challenging and multidisciplinary role. A good data scientist is like a Swiss army knife; you need to have business acumen, strong programming skill in multiple languages, a solid foundation of math and statistics, in-depth knowledge of machine learning, experience with big data processing, and storytelling ability. Those who can combine all these skills are powerful and rare; I'm glad to be one of them.

Are you working on anything interesting right now? If not, what's your dream project?

Yes, we have launched a project that aims to unlock the power of data science to better serve our customers -- it's called Project Forest. Through our iAuditor app we have collected an enormous mass of structured and unstructured data, such as events data, usage data, user data, etc., and from this we are going to extract fresh and useful knowledge and business value in real time.

The more insights we get from the data, the better performance improvements we can make to our products -- refining recommendation engines, real-time predictive models, intelligent marketing content, etc. The better our products serve our users, the more likely it is that we will retain customers, who will contribute more data. This is a wonderfully virtuous cycle that both we and our users can benefit from.

What's your favorite part about being a data scientist?

Every day we have so many interesting problems to work on and so much interesting data to play with. As data scientists, we collaborate with different teams in the company, such as marketing, customer success, and product design. From acquisition of new customers to their long-term loyalty, we can contribute in many ways throughout the customer's journey.

What's a personality trait you think people need to succeed at your job?

Enjoy facing challenges and endless learning. First, you should have curiosity about mountains of data. Then you'll explore the data and search for meaningful discoveries in very creative ways (and at optimized cost). When facing a challenge, you should always have high-level plans or solutions. That's why you should never stop learning so that you are capable of implementing what you plan.

What are the technical skills you think you need to become a data scientist?

Solid foundation of mathematics and statistics
In-depth knowledge of at least one programming language: R, Python, Scala, Java, etc.
Experience with SQL and NoSQL
Experience with Hadoop platforms and big data processing
Experience with unstructured data cleansing, preprocessing, and analysis

What's a typical day like for you?

In the morning, I usually will spend some time reviewing existing automatic reports, fixing bugs, and refining the error handling if needed. Throughout the day, there could be data questions asked by other teams that I answer as soon as possible.

When starting on a new task, the first step is collecting and retrieving data from our database, followed by cleaning and pruning the data. Next is determining the proper algorithm or combination of algorithms. Then I create the code for the task and test the algorithm, then evaluate the performance and optimize the analysis.

After, the results can be wrapped up, and visualization of the results can be delivered. The last step is to tell the story of the findings and spread the knowledge.

Where is data analytics/data science headed in the next few years?

Graph data modeling will become more popular as more people realize that data points consist of both their contents and their rich context. Big data science will focus on big and smart data because the variety of data will matter, not just the volume. Graph representation of data could become powerful enough to describe any data domain. I think that the graph modeling technique will enable even more intelligent and autonomous data-driven processes.

About the Author

James E. Powell is the editorial director of TDWI, including research reports, the Business Intelligence Journal, and Upside newsletter. You can contact him via email here.

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, and Team memberships available.

TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Research & Resources

Webinars

Virtual Summits

TDWI Articles

5 Minutes with a Data Scientist: Shujia Zhang of SafetyCulture

Related Articles

Trending Articles

From Reactive to Proactive: Automating Data Quality in Petabyte-Scale Analytics Pipelines

From Pilot to Production: Why LLM Features Stall, and a Readiness Checklist for Data Leaders

The Inferencing Cost Problem No One Is Talking About: Unstructured Data Quality

The Hidden Cost of Poor Training Data in Generative AI

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI

Engage

Research

Research & Resources

Webinars

Virtual Summits

TDWI Articles

5 Minutes with a Data Scientist: Shujia Zhang of SafetyCulture

Related Articles

Trending Articles

From Reactive to Proactive: Automating Data Quality in Petabyte-Scale Analytics Pipelines

From Pilot to Production: Why LLM Features Stall, and a Readiness Checklist for Data Leaders

The Inferencing Cost Problem No One Is Talking About: Unstructured Data Quality

The Hidden Cost of Poor Training Data in Generative AI

TDWI Membership

Accelerate Your Projects, and Your Career

TDWI

Engage

Research

Accelerate Your Projects,
and Your Career