Data Scientist (NLP and Classification Expert) ›
Filevine
This job is no longer accepting applications
See open jobs at Filevine.See open jobs similar to "Data Scientist (NLP and Classification Expert) ›" Album VC.Software Engineering, Data Science
Salt Lake City, UT, USA
Posted 6+ months ago
Responsibilities:
- Design, develop, and implement NLP algorithms and techniques for text preprocessing, feature extraction, sentiment analysis, topic modeling, named entity recognition, document classification, and other related tasks.
- Develop robust classification models and frameworks using state-of-the-art machine learning and deep learning techniques for various applications, including document categorization, text classification, sentiment analysis, and recommendation systems.
- Help define workflow and data stores to get data out of unstructured stores and into usable Data Science formats.
- Collaborate with cross-functional teams, including product owners, software developers, and domain experts, to understand business requirements and develop end-to-end solutions.
- Perform exploratory data analysis and visualization to gain insights into textual data, identify patterns, and inform feature engineering and model development.
- Evaluate and compare the performance of different (but not limited to) NLP, generative and classification models, and propose enhancements or modifications to improve their accuracy, efficiency, and scalability.
- Stay up-to-date with the latest advancements in machine learning methodologies, techniques, and frameworks, and apply them to solve complex business problems.
- Communicate findings, insights, and technical concepts effectively to both technical and non-technical stakeholders through reports, presentations, and visualizations.
- Support implementation of analytics tools and methodologies within our engineering tech stack.
Requirements:
- Master's or Ph.D. degree in Computer Science, Data Science, Statistics, or a related field.
- Strong background and expertise in Natural Language Processing (NLP) techniques, including text preprocessing, feature extraction, sentiment analysis, topic modeling, named entity recognition, and document classification.
- Proven experience in designing and implementing classification models and algorithms, such as Naïve Bayes, Logistic Regression, Support Vector Machines (SVM), Random Forests, Gradient Boosting, and Neural Networks.
- Proficiency in programming languages such as Python, Spark or Java, and libraries/frameworks such as NLTK, SpaCy, scikit-learn, TensorFlow, or PyTorch.
- Experience with data manipulation, analysis, and visualization using tools such as Pandas, NumPy, Matplotlib.
- Strong understanding of statistical analysis and machine learning principles, and ability to apply them to real-world legal problems.
- Solid knowledge of software development practices, version control systems, and agile methodologies.
- Excellent problem-solving skills, analytical thinking, and attention to detail.
- Effective communication skills and ability to collaborate in a team-oriented environment.
- Proven track record of delivering high-quality results on time and effectively managing high profile projects and priorities.
Preferred Skills:
- Experience with true big data (exabytes and higher) procession practices.
- Knowledge of cloud computing platforms such as AWS, Azure, or GCP.
- Ability to mentor and educate on Data Science deployment and best practices to technical groups
30-Day Goals: Understanding our Data and defining Standard Fields to Fuel Settlement Prediction
- Understanding our Data and defining Standard Fields to Fuel Settlement PredictionConduct an in-depth analysis of the unstructured JSON data corpus to understand its characteristics, key attributes, and potential challenges in our Filevine Core dataset to help define a Standard Fields approach to leverage/use unstructured data.
- Develop a data preprocessing pipeline to clean, normalize, and transform the JSON data into a structured format suitable for NLP and text classification tasks on specific data sets using AWS Sagemaker.
60-Day Goals: Adding Value to our Data by Creating Standard Fields to Fuel Settlement Prediction
- Develop and fine-tune NLP models for tasks such as named entity recognition, topic modeling, and text categorization using the unstructured data from Filevine Core for standard fields (global).
90-Day Goals: Adding Value to our Data by Creating Standard Fields to Fuel Settlement Prediction
- Continue to develop and fine-tune NLP models for tasks such as named entity recognition, topic modeling, and text categorization using the unstructured data from Filevine Core for standard fields (local).
6 Month Goal: Standard fields are being used to generate a Settlement Prediction/Amount in Beta testing
- Have Standard Fields (Local and Global) available for DS/Analytics use from Filevine core data.
- Create a model that leverages standard fields (features) to predict settlement likelihood and settlement amount.
This job is no longer accepting applications
See open jobs at Filevine.See open jobs similar to "Data Scientist (NLP and Classification Expert) ›" Album VC.