Data Scientist/ NLP expert

Data Scientist/ NLP expert

Job Title: Data Scientist/ NLP expert
Contract Type: Permanent
Location: Mumbai
Reference: 4_13418
Contact Name: Pooja Mungekar
Contact Email:
Job Published: April 13, 2018 17:40

Job Description

Our client, a leading B2B analytics firm is looking for a Data Scientist with a strong background in Natural Language Processing and Unstructured Data Mining. Lack of Financial Markets knowledge is not an issue at all.

Required Skills

  • Advanced degree from an accredited college/university in Computer Science, Computational Linguistics, Applied Math or Statistics, Engineering, Bioinformatics, Physics, O.R., or related (strong math/stats background with an ability to understand algorithms and methods from both mathematical and intuitive viewpoints)
  • In-depth knowledge of various NLP domains such as entity extraction, speech recognition, topic modeling, machine translation, natural language understanding, parsing, question answering, etc
  • Expertise in text mining (probabilistic topic model, word association mining, ontology learning, opinion mining and sentiment analysis, semantic similarity, etc.)
  • Expertise in natural language processing/understanding (word representation, sentiment analysis, relation extraction, natural language inference, semantic parsing, etc.)
  • Excellent background in machine learning (generative model, discriminative model, neural network, regression, classification, clustering, etc.)
  • Experience in deep learning on NLP/NLU is a big plus
  • Extensive experiences in using NLP related techniques/algorithms such as HMM, CRF, deep learning & recurrent ANN, word2vec/doc2vec, Bayesian modeling, etc
  • Success in building strong ontology / taxonomies
  • Strong data extraction and processing skills and experience
  • Experience in applied statistics including sampling approaches, experiments, modeling, and data mining techniques
  • Experience building analytical models and working with structured and unstructured data sets
  • Deep expertise in implementing algorithms in python.
  • Experience with data structures and algorithms and ability to work in a Unix environment, processing large amounts of data in a big data environment
  • Significant experience building robust data processing and analytics pipelines
  • Experience with one or more modern Big Data technology stacks
  • Contributions to research communities, e.g. ACL, NIPS, ICML, CVPR, etc. is a Plus


  • To develop and apply bleeding edge machine learning algorithms and statistical pattern recognition on extremely large text corpora in the capital markets domain.
  • Utilize statistical natural language processing to mine unstructured data, and create insights; analyze and model structured data using advanced statistical methods and implement algorithms and software needed to perform analyses
  • Build document clustering, topic analysis, text classification, named entity recognition, sentiment analysis, and part-of-speech tagging methods for unstructured and semi-structured data
  • Cluster and analyze large amounts of user generated content and process data in large-scale environments using Amazon EC2, Storm, Hadoop and Spark
  • Develop and perform text classification using methods such as logistic regression, decision trees, support vector machines and maximum entropy classifiers
  • Perform text mining, generate and test working hypotheses, prepare and analyze historical data and identify patterns
  • Generate creative solutions (patents) and publish research results in top conferences (papers)
  • Technology Stack for the Resultant Application
  • Data Storage + Analytics: AWS/ Cloudera (on-Premise) Hadoop Ecosystem with MongoDB & Elastic Search on S3 or on Premise
  • Queueing System: RabbitMQ
  • Programming: Python
  • Front-End: HTML5 for WebApp and Objective C for ios. Potentially hybrid framework for iOS and Android
  • CDN: AWS or On Premise Routing