hero
30
companies
261
Jobs

Data Scientist (NLP and Classification Expert) ›

Filevine

Filevine

Software Engineering, Data Science
Salt Lake City, UT, USA
Posted on Saturday, July 29, 2023

Responsibilities:

  • Design, develop, and implement NLP algorithms and techniques for text preprocessing, feature extraction, sentiment analysis, topic modeling, named entity recognition, document classification, and other related tasks.
  • Develop robust classification models and frameworks using state-of-the-art machine learning and deep learning techniques for various applications, including document categorization, text classification, sentiment analysis, and recommendation systems.
  • Help define workflow and data stores to get data out of unstructured stores and into usable Data Science formats.
  • Collaborate with cross-functional teams, including product owners, software developers, and domain experts, to understand business requirements and develop end-to-end solutions.
  • Perform exploratory data analysis and visualization to gain insights into textual data, identify patterns, and inform feature engineering and model development.
  • Evaluate and compare the performance of different (but not limited to) NLP, generative and classification models, and propose enhancements or modifications to improve their accuracy, efficiency, and scalability.
  • Stay up-to-date with the latest advancements in machine learning methodologies, techniques, and frameworks, and apply them to solve complex business problems.
  • Communicate findings, insights, and technical concepts effectively to both technical and non-technical stakeholders through reports, presentations, and visualizations.
  • Support implementation of analytics tools and methodologies within our engineering tech stack.

Requirements:

  • Master's or Ph.D. degree in Computer Science, Data Science, Statistics, or a related field.
  • Strong background and expertise in Natural Language Processing (NLP) techniques, including text preprocessing, feature extraction, sentiment analysis, topic modeling, named entity recognition, and document classification.
  • Proven experience in designing and implementing classification models and algorithms, such as Naïve Bayes, Logistic Regression, Support Vector Machines (SVM), Random Forests, Gradient Boosting, and Neural Networks.
  • Proficiency in programming languages such as Python, Spark or Java, and libraries/frameworks such as NLTK, SpaCy, scikit-learn, TensorFlow, or PyTorch.
  • Experience with data manipulation, analysis, and visualization using tools such as Pandas, NumPy, Matplotlib.
  • Strong understanding of statistical analysis and machine learning principles, and ability to apply them to real-world legal problems.
  • Solid knowledge of software development practices, version control systems, and agile methodologies.
  • Excellent problem-solving skills, analytical thinking, and attention to detail.
  • Effective communication skills and ability to collaborate in a team-oriented environment.
  • Proven track record of delivering high-quality results on time and effectively managing high profile projects and priorities.

Preferred Skills:

  • Experience with true big data (exabytes and higher) procession practices.
  • Knowledge of cloud computing platforms such as AWS, Azure, or GCP.
  • Ability to mentor and educate on Data Science deployment and best practices to technical groups

30-Day Goals: Understanding our Data and defining Standard Fields to Fuel Settlement Prediction

  • Understanding our Data and defining Standard Fields to Fuel Settlement PredictionConduct an in-depth analysis of the unstructured JSON data corpus to understand its characteristics, key attributes, and potential challenges in our Filevine Core dataset to help define a Standard Fields approach to leverage/use unstructured data.
  • Develop a data preprocessing pipeline to clean, normalize, and transform the JSON data into a structured format suitable for NLP and text classification tasks on specific data sets using AWS Sagemaker.

60-Day Goals: Adding Value to our Data by Creating Standard Fields to Fuel Settlement Prediction

  • Develop and fine-tune NLP models for tasks such as named entity recognition, topic modeling, and text categorization using the unstructured data from Filevine Core for standard fields (global).

90-Day Goals: Adding Value to our Data by Creating Standard Fields to Fuel Settlement Prediction

  • Continue to develop and fine-tune NLP models for tasks such as named entity recognition, topic modeling, and text categorization using the unstructured data from Filevine Core for standard fields (local).

6 Month Goal: Standard fields are being used to generate a Settlement Prediction/Amount in Beta testing

  • Have Standard Fields (Local and Global) available for DS/Analytics use from Filevine core data.
  • Create a model that leverages standard fields (features) to predict settlement likelihood and settlement amount.